Lernen

an automata learning library written in Ruby

Generate Convert Improve

Install / Use

/learn @makenowjust/Lernen

About this skill

Quality Score

0/100

README

Lernen

an automata learning library written in Ruby.

Short Introduction to Automata Learning and Lernen

Automata learning is a technique to infer an automaton from a program.

Once getting an automaton from a program, we earn some benefits:

Visualization: An automaton is a state-transition system, i.e., a labelled graph. Therefore, graph visualization tools such as GraphViz and Mermaid allow you to see the structure of the system at the ready.
Model checking: An automaton is a model of the system. Model checking ensures some good properties about the system, e.g., that two different implementations of the system behave exactly the same.

Lernen is an automata learning library written in Ruby. This library includes implementations of not only eminent automata learning algorithms such as Angluin's $L^\ast$, $\textrm{Kearns-Vazirani}$ ($\textrm{KV}$), and $\textrm{DHC}$, but also a modern algorithm such as $L^\#$. Also, this library supports inferring an automaton accepting a non-regular language, namely VPA (visibly pushdown automaton) and SPA (system of procedural automata).

As case studies of the real-world applications of automata learning, we introduce two examples with Lernen.

Case Study 1: `URI.parse` and `URI` Regexp (`examples/uri_parse_regexp.rb`)

URL validation is a common task in a Web application, and we can achieve this task by using URI.parse. For example, the following method valid_and_http_url? checks whether or not a given string is valid as URI and its scheme is http or https.

def valid_and_http_url?(string)
  uri = URI.parse(string)
  uri.scheme == "http" || uri.scheme == "https"
rescue URI::Error
  false
end

However, this method is a bit inefficient because it allocates a URI object for each call.

In Ruby's uri library, fortunately, we have the URI::DEFAULT_PARSER.make_regexp method to build a regexp pattern that matches valid URIs with given schemes. Thus, we can rewrite the valid_and_http_url? method with using this.

VALID_AND_HTTP_URL_REGEXP = /\A#{URI::DEFAULT_PARSER.make_regexp(%w[http https])}\z/
def new_valid_and_http_url?(string)
  string.match?(VALID_AND_HTTP_URL_REGEXP)
end

Then, new_valid_and_http_url? avoids allocations. The performance of the new one is better.

Now, we have a question: Do these implementations behave exactly the same?

It is a typical question appeared on refactoring a program, and Lernen and automata learning can give an answer for it.

First, we need to infer two automata from two validation methods. That can be done by the following code.

# `alphabet` is an array of pieces of words.
# Learning algorithm infers an automaton on this alphabet, so in this case,
# we specify some possible subwords in URLs to `alphabet`.
alphabet = %w[http https ftp example com foo 80 12 : / . ? = & # @ %]

# `oracle` specifies a kind of an equivalence oracle using on learning,
# and `oracle_params` is a paremeter object to it.
oracle = :random_word
oracle_params = { max_words: 2000 }.freeze

# Infer a automaton by calling the `Lernen.learn` method with the target program.

# `URI.parse` DFA:
uri_parse_dfa = Lernen.learn(alphabet:, oracle:, oracle_params:) do |word|
  # `word.join` is necessary because `word` is an array of `alphabet` elements.
  valid_and_http_url?(word.join)
end

# `URI` regexp DFA:
uri_regexp_dfa = Lernen.learn(alphabet:, oracle:, oracle_params:) do |word|
  new_valid_and_http_url?(word.join)
end

uri_parse_dfa and uri_regexp_dfa are Lernen::Automaton::DFA objects. DFA#to_mermaid returns a Mermaid diagram representation of the obtained DFA.

uri_parse_dfa.to_mermaid
# => "flowchart TD\n" ...
uri_regexp_dfa.to_mermaid
# => "flowchart TD\n" ...

<details> <summary>Mermaid diagrams for <code>URI.parse</code> and <code>URI</code> regexp DFAs</summary>

`URI.parse` DFA

flowchart TD
  0((0))
  1((1))
  2((2))
  3(((3)))
  4(((4)))
  5(((5)))
  6((6))
  7(((7)))

  0 -- "'http'" --> 1
  0 -- "'https'" --> 1
  0 -- "'ftp'" --> 2
  0 -- "'example'" --> 2
  0 -- "'com'" --> 2
  0 -- "'foo'" --> 2
  0 -- "'80'" --> 2
  0 -- "'12'" --> 2
  0 -- "':'" --> 2
  0 -- "'/'" --> 2
  0 -- "'.'" --> 2
  0 -- "'?'" --> 2
  0 -- "'='" --> 2
  0 -- "'&'" --> 2
  0 -- "'#'" --> 2
  0 -- "'@'" --> 2
  0 -- "'%'" --> 2
  1 -- "'http'" --> 2
  1 -- "'https'" --> 2
  1 -- "'ftp'" --> 2
  1 -- "'example'" --> 2
  1 -- "'com'" --> 2
  1 -- "'foo'" --> 2
  1 -- "'80'" --> 2
  1 -- "'12'" --> 2
  1 -- "':'" --> 3
  1 -- "'/'" --> 2
  1 -- "'.'" --> 2
  1 -- "'?'" --> 2
  1 -- "'='" --> 2
  1 -- "'&'" --> 2
  1 -- "'#'" --> 2
  1 -- "'@'" --> 2
  1 -- "'%'" --> 2
  2 -- "'http'" --> 2
  2 -- "'https'" --> 2
  2 -- "'ftp'" --> 2
  2 -- "'example'" --> 2
  2 -- "'com'" --> 2
  2 -- "'foo'" --> 2
  2 -- "'80'" --> 2
  2 -- "'12'" --> 2
  2 -- "':'" --> 2
  2 -- "'/'" --> 2
  2 -- "'.'" --> 2
  2 -- "'?'" --> 2
  2 -- "'='" --> 2
  2 -- "'&'" --> 2
  2 -- "'#'" --> 2
  2 -- "'@'" --> 2
  2 -- "'%'" --> 2
  3 -- "'http'" --> 3
  3 -- "'https'" --> 3
  3 -- "'ftp'" --> 3
  3 -- "'example'" --> 3
  3 -- "'com'" --> 3
  3 -- "'foo'" --> 3
  3 -- "'80'" --> 3
  3 -- "'12'" --> 3
  3 -- "':'" --> 3
  3 -- "'/'" --> 3
  3 -- "'.'" --> 3
  3 -- "'?'" --> 4
  3 -- "'='" --> 3
  3 -- "'&'" --> 3
  3 -- "'#'" --> 5
  3 -- "'@'" --> 3
  3 -- "'%'" --> 6
  4 -- "'http'" --> 4
  4 -- "'https'" --> 4
  4 -- "'ftp'" --> 4
  4 -- "'example'" --> 4
  4 -- "'com'" --> 4
  4 -- "'foo'" --> 4
  4 -- "'80'" --> 4
  4 -- "'12'" --> 4
  4 -- "':'" --> 4
  4 -- "'/'" --> 4
  4 -- "'.'" --> 4
  4 -- "'?'" --> 4
  4 -- "'='" --> 4
  4 -- "'&'" --> 4
  4 -- "'#'" --> 5
  4 -- "'@'" --> 4
  4 -- "'%'" --> 7
  5 -- "'http'" --> 5
  5 -- "'https'" --> 5
  5 -- "'ftp'" --> 5
  5 -- "'example'" --> 5
  5 -- "'com'" --> 5
  5 -- "'foo'" --> 5
  5 -- "'80'" --> 5
  5 -- "'12'" --> 5
  5 -- "':'" --> 5
  5 -- "'/'" --> 5
  5 -- "'.'" --> 5
  5 -- "'?'" --> 5
  5 -- "'='" --> 5
  5 -- "'&'" --> 5
  5 -- "'#'" --> 2
  5 -- "'@'" --> 5
  5 -- "'%'" --> 6
  6 -- "'http'" --> 2
  6 -- "'https'" --> 2
  6 -- "'ftp'" --> 2
  6 -- "'example'" --> 2
  6 -- "'com'" --> 2
  6 -- "'foo'" --> 2
  6 -- "'80'" --> 3
  6 -- "'12'" --> 3
  6 -- "':'" --> 2
  6 -- "'/'" --> 2
  6 -- "'.'" --> 2
  6 -- "'?'" --> 2
  6 -- "'='" --> 2
  6 -- "'&'" --> 2
  6 -- "'#'" --> 2
  6 -- "'@'" --> 2
  6 -- "'%'" --> 2
  7 -- "'http'" --> 2
  7 -- "'https'" --> 2
  7 -- "'ftp'" --> 4
  7 -- "'example'" --> 4
  7 -- "'com'" --> 4
  7 -- "'foo'" --> 4
  7 -- "'80'" --> 4
  7 -- "'12'" --> 4
  7 -- "':'" --> 3
  7 -- "'/'" --> 3
  7 -- "'.'" --> 3
  7 -- "'?'" --> 3
  7 -- "'='" --> 3
  7 -- "'&'" --> 3
  7 -- "'#'" --> 5
  7 -- "'@'" --> 3
  7 -- "'%'" --> 3

`URI` regexp DFA

flowchart TD
  0((0))
  1((1))
  2((2))
  3(((3)))
  4(((4)))

  0 -- "'http'" --> 1
  0 -- "'https'" --> 1
  0 -- "'ftp'" --> 2
  0 -- "'example'" --> 2
  0 -- "'com'" --> 2
  0 -- "'foo'" --> 2
  0 -- "'80'" --> 2
  0 -- "'12'" --> 2
  0 -- "':'" --> 2
  0 -- "'/'" --> 2
  0 -- "'.'" --> 2
  0 -- "'?'" --> 2
  0 -- "'='" --> 2
  0 -- "'&'" --> 2
  0 -- "'#'" --> 2
  0 -- "'@'" --> 2
  0 -- "'%'" --> 2
  1 -- "'http'" --> 2
  1 -- "'https'" --> 2
  1 -- "'ftp'" --> 2
  1 -- "'example'" --> 2
  1 -- "'com'" --> 2
  1 -- "'foo'" --> 2
  1 -- "'80'" --> 2
  1 -- "'12'" --> 2
  1 -- "':'" --> 3
  1 -- "'/'" --> 2
  1 -- "'.'" --> 2
  1 -- "'?'" --> 2
  1 -- "'='" --> 2
  1 -- "'&'" --> 2
  1 -- "'#'" --> 2
  1 -- "'@'" --> 2
  1 -- "'%'" --> 2
  2 -- "'http'" --> 2
  2 -- "'https'" --> 2
  2 -- "'ftp'" --> 2
  2 -- "'example'" --> 2
  2 -- "'com'" --> 2
  2 -- "'foo'" --> 2
  2 -- "'80'" --> 2
  2 -- "'12'" --> 2
  2 -- "':'" --> 2
  2 -- "'/'" --> 2
  2 -- "'.'" --> 2
  2 -- "'?'" --> 2
  2 -- "'='" --> 2
  2 -- "'&'" --> 2
  2 -- "'#'" --> 2
  2 -- "'@'" --> 2
  2 -- "'%'" --> 2
  3 -- "'http'" --> 3
  3 -- "'https'" --> 3
  3 -- "'ftp'" --> 3
  3 -- "'example'" --> 3
  3 -- "'com'" --> 3
  3 -- "'foo'" --> 3
  3 -- "'80'" --> 3
  3 -- "'12'" --> 3
  3 -- "':'" --> 3
  3 -- "'/'" --> 3
  3 -- "'.'" --> 3
  3 -- "'?'" --> 3
  3 -- "'='" --> 3
  3 -- "'&'" --> 3
  3 -- "'#'" --> 4
  3 -- "'@'" --> 3
  3 -- "'%'" --> 2
  4 -- "'http'" --> 4
  4 -- "'https'" --> 4
  4 -- "'ftp'" --> 4
  4 -- "'example'" --> 4
  4 -- "'com'" --> 4
  4 -- "'foo'" --> 4
  4 -- "'80'" --> 4
  4 -- "'12'" --> 4
  4 -- "':'" --> 4
  4 -- "'/'" --> 4
  4 -- "'.'" --> 4
  4 -- "'?'" --> 4
  4 -- "'='" --> 4
  4 -- "'&'" --> 4
  4 -- "'#'" --> 2
  4 -- "'@'" --> 4
  4 -- "'%'" --> 2

</details>

Next, we use DFA.find_separating_word to check equivalence between two automata. This method finds a seperating word between two automata that is accepted by one automaton and rejected by another automaton. This method returns nil if a separating word is not found, that is, two automata are equivalent.

sep_word = Lernen::Automaton::DFA.find_separating_word(alphabet, uri_parse_dfa, uri_regexp_dfa)
sep_word&.join
# => "http:?%"

Then, we got "http:?%" as the separating word between the two automata. It means that the two DFAs of URI.parse and URI regexp obtained by Lernen.learn are not equivalent. Finally, we need to ensure that the separating word distinguish the actual implementations: `valid_and_http_

Related Skills

YC-Killer

2.7k

A library of enterprise-grade AI agents designed to democratize artificial intelligence and provide free, open-source alternatives to overvalued Y Combinator startups. If you are excited about democratizing AI access & AI agents, please star ⭐️ this repository and use the link in the readme to join our open source AI research team.

best-practices-researcher

The most comprehensive Claude Code skills registry | Web Search: https://skills-registry-web.vercel.app

groundhog

399

Groundhog's primary purpose is to teach people how Cursor and all these other coding agents work under the hood. If you understand how these coding assistants work from first principles, then you can drive these tools harder (or perhaps make your own!).

last30days-skill

18.7k

AI agent skill that researches any topic across Reddit, X, YouTube, HN, Polymarket, and the web - then synthesizes a grounded summary

makenowjust

View profile

View on GitHub

GitHub Stars34

CategoryEducation

Updated6mo ago

Forks0

makenowjust/lernen

Languages

Ruby

Security Score

82/100

Audited on Oct 6, 2025

No findings

Lernen

Install / Use

README

Lernen

Short Introduction to Automata Learning and Lernen

Case Study 1: URI.parse and URI Regexp (examples/uri_parse_regexp.rb)

URI.parse DFA

URI regexp DFA

Related Skills

Case Study 1: `URI.parse` and `URI` Regexp (`examples/uri_parse_regexp.rb`)

`URI.parse` DFA

`URI` regexp DFA