Parsby
Parser combinator library for Ruby inspired by Haskell's Parsec
Install / Use
/learn @jolmg/ParsbyREADME
Parsby
Parser combinator library for Ruby, based on Haskell's Parsec.
- Installation
- Examples
- Introduction
- Some commonly used combinators
- Defining combinators
- Defining parsers as modules
ExpectationFailed- Recursive parsers with
lazy - Parsing left-recursive languages with
reducecombinator Parsby.new- Parsing from a string, a file, a pipe, a socket, ...
- Comparing with Haskell's Parsec
- Development
Installation
Add this line to your application's Gemfile:
gem 'parsby'
And then execute:
$ bundle
Or install it yourself as:
$ gem install parsby
Examples
If you'd like to jump right into example parsers that use this library, there's a few in this source:
Introduction
This is a library used to define parsers by declaratively describing a syntax using what's commonly referred to as combinators. Parser combinators are functions that take parsers as inputs and/or return parsers as outputs, i.e. they combine parsers into new parsers.
As an example, between is a combinator with 3 parameters: a parser for
what's to the left, one for what's to the right, and lastly one for what's
in-between them, and it returns a parser that, after parsing, returns the
result of the in-between parser:
between(lit("<"), lit(">"), decimal).parse "<100>"
#=> 100
Some commonly used combinators
# Parse argument string literally
lit("foo").parse "foo"
#=> "foo"
# Case insensitive lit
ilit("Foo").parse "fOo"
#=> "fOo"
# Make any value into a parser that results in that value without
# consuming input.
pure("foo").parse ""
#=> "foo"
# Parse foo or bar
(lit("foo") | lit("bar")).parse "bar"
#=> "bar"
# Like `|`, parse one of foo or bar. `choice` is better when you have
# many choices to chose from. You can pass it any number of parsers or
# array of parsers.
choice(lit("foo"), lit("bar")).parse "bar"
#=> "bar"
# Parse with each argument in succesion and group the results in an
# array.
group(lit("foo"), lit("bar")).parse "foobar"
#=> ["foo", "bar"]
# Parse foo and bar, returning bar.
(lit("foo") > lit("bar")).parse "foobar"
#=> "bar"
# Parse foo and bar, returning foo.
(lit("foo") < lit("bar")).parse "foobar"
#=> "foo"
# Make parser optional
group(optional(lit("foo")), lit("bar")).parse "bar"
#=> [nil, "bar"]
# Use parser zero or more times, grouping results in array. many_1, does
# the same, but requires parsing at least once.
many(lit("foo")).parse "foofoo"
#=> ["foo", "foo"]
# Parse many, but each separated by something. sep_by_1 requires at least
# one element to be parsed.
sep_by(lit(","), lit("foo")).parse "foo,foo"
#=> ["foo", "foo"]
# `whitespace` (alias `ws`) is zero or more whitespace characters.
# `whitespace_1` (alias `ws_1`) is one or more whitespace characters.
# `spaced` allows a parser to be surrounded by optional whitespace.
# `whitespace_1` is the base definition. If you extend it to e.g. add the
# parsing of comments, the other combinators will also recognize that
# change.
(whitespace > lit("foo")).parse " foo"
#=> "foo"
group(lit("foo"), ws_1 > lit("bar")).parse "foo bar"
#=> ["foo", "bar"]
spaced(lit("foo")).parse " foo "
#=> "foo"
# Parse transform result according to block.
lit("foo").fmap {|x| x.upcase }.parse "foo"
#=> "FOO"
# join(p) is the same as p.fmap {|xs| xs.join }
join(sep_by(lit(","), lit("foo") | lit("bar"))).parse "foo,bar"
#=> "foobar"
# Parse a character from the choices in a set of strings or ranges
char_in(" \t\r\n").parse "\t"
#=> "\t"
typical_identifier_characters = ['a'..'z', 'A'..'Z', 0..9, "_"]
join(many(char_in("!?", typical_identifier_characters))).parse "foo23? bar"
#=> "foo23?"
# Parse any one character
any_char.parse "foo"
#=> "f"
# Require end of input at end of parse.
(lit("foo") < eof).parse "foobar"
#=> Parsby::ExpectationFailed: line 1:
foobar
| * failure: eof
\-/ *| success: lit("foo")
\|
| * failure: (lit("foo") < eof)
# Parse only when other parser fails.
join(many(any_char.that_fails(whitespace_1))).parse "foo bar"
#=> "foo"
# single(p) is the same as p.fmap {|x| [x] }
single(lit("foo")).parse "foo"
#=> ["foo"]
# p1 + p2 is the same as group(p1, p2).fmap {|(r1, r2)| r1 + r2 }
(lit("foo") + (ws > lit("bar"))).parse "foo bar"
#=> "foobar"
(single(lit("foo")) + many(ws > lit("bar"))).parse "foo bar bar"
#=> ["foo", "bar", "bar"]
Defining combinators
If you look at the examples in this source, you'll notice that almost all
combinators are defined with define_combinator. Strictly speaking, it's
not necessary to use that to define combinators. You can do it with
variable assignment or def syntax. Nevertheless, define_combinator is
preferred because it automates the assignment of a label to the combinator.
Consider this example:
define_combinator :between do |left, right, p|
left > p < right
end
between(lit("<"), lit(">"), lit("foo")).label
#=> 'between(lit("<"), lit(">"), lit("foo"))'
Having labels that resemble the source code is helpful for the error messages.
If we use def instead of define_combinator, then the label would be
that of its definition. In the following case, it would be that assigned by
<.
def between(left, right, p)
left > p < right
end
between(lit("<"), lit(">"), lit("foo")).label
#=> '((lit("<") > lit("foo")) < lit(">"))'
If we're to wrap that parser in a new one, then the label would be simply unknown.
def between(left, right, p)
Parsby.new {|c| (left > p < right).parse c }
end
between(lit("<"), lit(">"), lit("foo")).label.to_s
#=> "unknown"
Defining parsers as modules
The typical pattern I use is something like this:
module FoobarParser
include Parsby::Combinators
extend self
# Entrypoint nicety
def parse(s)
foobar.parse s
end
define_combinator :foobar do
foo + bar
end
define_combinator :foo do
lit("foo")
end
define_combinator :bar do
lit("bar")
end
end
From that, you can use it directly as:
FoobarParser.parse "foobar"
#=> "foobar"
FoobarParser.foo.parse "foo"
#=> "foo"
Being able to use subparsers directly is useful for when you want to e.g. parse a JSON array, instead of any JSON value.
Writing the parser as a module like that also makes it easy to make a new parser based on it:
module FoobarbazParser
include FoobarParser
extend self
def parse(s)
foobarbaz.parse s
end
define_combinator :foobarbaz do
foobar + baz
end
define_combinator :baz do
lit("baz")
end
end
You can also define such a module to hold your own project's combinators to use in multiple parsers.
ExpectationFailed
Here's an example of an error, when parsing fails:
pry(main)> Parsby::Example::LispParser.sexp.parse "(foo `(foo ,bar) 2.3 . . nil)"
Parsby::ExpectationFailed: line 1:
(foo `(foo ,bar) 2.3 . . nil)
| * failure: char_in("([")
| * failure: list
| *| failure: symbol
| *|| failure: nil
| *||| failure: string
| *|||| failure: number
\\\||
| *| failure: atom
| *|| failure: abbrev
\\|
| * failure: sexp
V *| success: lit(".")
\-/ *|| success: sexp
\---------/ *||| success: sexp
\-/ *|||| success: sexp
V *||||| success: char_in("([")
\\\\\|
| * failure: list
| * failure: sexp
As can be seen, Parsby manages a tree structure representing parsers and their subparsers, with the information of where a particular parser began parsing, where it ended, whether it succeeded or failed, and the label of the parser.
It might be worth mentioning that when debugging a parser from an
unexpected ExpectationFailed error, the backtrace isn't really useful.
That's because the backtrace points to the code involved in parsing, not
the code involved in constructing the parsers, which succeeded, but is
where the problem typically lies. The tree-looking exception message above
is meant to somewhat substitute the utility of the backtrace in these
cases.
Relating to that, the right-most text are the labels of the corresponding
parsers. I find that labels that resemble the source code are quite useful,
just like the code location descriptions that appear right-most in
backtraces. It's because of this that I consider the use of
define_combinator more preferable than using def and explicitly
assigning labels.
Cleaning up the parse tree for the trace
If you look at the source of the example lisp parser, you might note that
there are a lot more parsers in between those shown in the tree above.
sexp is not a direct child of list, for example, despite it appearing
as so. There are at least 6 ancestors/descendant parsers between list and
`se
Related Skills
node-connect
339.5kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
83.9kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
339.5kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
commit-push-pr
83.9kCommit, push, and open a PR
