SkillAgentSearch skills...

Parsby

Parser combinator library for Ruby inspired by Haskell's Parsec

Install / Use

/learn @jolmg/Parsby
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

Parsby

Parser combinator library for Ruby, based on Haskell's Parsec.

Installation

Add this line to your application's Gemfile:

gem 'parsby'

And then execute:

$ bundle

Or install it yourself as:

$ gem install parsby

Examples

If you'd like to jump right into example parsers that use this library, there's a few in this source:

Introduction

This is a library used to define parsers by declaratively describing a syntax using what's commonly referred to as combinators. Parser combinators are functions that take parsers as inputs and/or return parsers as outputs, i.e. they combine parsers into new parsers.

As an example, between is a combinator with 3 parameters: a parser for what's to the left, one for what's to the right, and lastly one for what's in-between them, and it returns a parser that, after parsing, returns the result of the in-between parser:

between(lit("<"), lit(">"), decimal).parse "<100>"
#=> 100

Some commonly used combinators

# Parse argument string literally
lit("foo").parse "foo"
#=> "foo"

# Case insensitive lit
ilit("Foo").parse "fOo"
#=> "fOo"

# Make any value into a parser that results in that value without
# consuming input.
pure("foo").parse ""
#=> "foo"

# Parse foo or bar
(lit("foo") | lit("bar")).parse "bar"
#=> "bar"

# Like `|`, parse one of foo or bar. `choice` is better when you have
# many choices to chose from. You can pass it any number of parsers or
# array of parsers.
choice(lit("foo"), lit("bar")).parse "bar"
#=> "bar"

# Parse with each argument in succesion and group the results in an
# array.
group(lit("foo"), lit("bar")).parse "foobar"
#=> ["foo", "bar"]

# Parse foo and bar, returning bar.
(lit("foo") > lit("bar")).parse "foobar"
#=> "bar"

# Parse foo and bar, returning foo.
(lit("foo") < lit("bar")).parse "foobar"
#=> "foo"

# Make parser optional
group(optional(lit("foo")), lit("bar")).parse "bar"
#=> [nil, "bar"]

# Use parser zero or more times, grouping results in array. many_1, does
# the same, but requires parsing at least once.
many(lit("foo")).parse "foofoo"
#=> ["foo", "foo"]

# Parse many, but each separated by something. sep_by_1 requires at least
# one element to be parsed.
sep_by(lit(","), lit("foo")).parse "foo,foo"
#=> ["foo", "foo"]

# `whitespace` (alias `ws`) is zero or more whitespace characters.
# `whitespace_1` (alias `ws_1`) is one or more whitespace characters.
# `spaced` allows a parser to be surrounded by optional whitespace.
# `whitespace_1` is the base definition. If you extend it to e.g. add the
# parsing of comments, the other combinators will also recognize that
# change.
(whitespace > lit("foo")).parse "   foo"
#=> "foo"
group(lit("foo"), ws_1 > lit("bar")).parse "foo   bar"
#=> ["foo", "bar"]
spaced(lit("foo")).parse "   foo    "
#=> "foo"

# Parse transform result according to block.
lit("foo").fmap {|x| x.upcase }.parse "foo"
#=> "FOO"

# join(p) is the same as p.fmap {|xs| xs.join }
join(sep_by(lit(","), lit("foo") | lit("bar"))).parse "foo,bar"
#=> "foobar"

# Parse a character from the choices in a set of strings or ranges
char_in(" \t\r\n").parse "\t"
#=> "\t"
typical_identifier_characters = ['a'..'z', 'A'..'Z', 0..9, "_"]
join(many(char_in("!?", typical_identifier_characters))).parse "foo23? bar"
#=> "foo23?"

# Parse any one character
any_char.parse "foo"
#=> "f"

# Require end of input at end of parse.
(lit("foo") < eof).parse "foobar"
#=> Parsby::ExpectationFailed: line 1:
  foobar
     |    * failure: eof
  \-/    *| success: lit("foo")
         \|
  |       * failure: (lit("foo") < eof)

# Parse only when other parser fails.
join(many(any_char.that_fails(whitespace_1))).parse "foo bar"
#=> "foo"

# single(p) is the same as p.fmap {|x| [x] }
single(lit("foo")).parse "foo"
#=> ["foo"]

# p1 + p2 is the same as group(p1, p2).fmap {|(r1, r2)| r1 + r2 }
(lit("foo") + (ws > lit("bar"))).parse "foo bar"
#=> "foobar"
(single(lit("foo")) + many(ws > lit("bar"))).parse "foo bar bar"
#=> ["foo", "bar", "bar"]

Defining combinators

If you look at the examples in this source, you'll notice that almost all combinators are defined with define_combinator. Strictly speaking, it's not necessary to use that to define combinators. You can do it with variable assignment or def syntax. Nevertheless, define_combinator is preferred because it automates the assignment of a label to the combinator. Consider this example:

define_combinator :between do |left, right, p|
  left > p < right
end

between(lit("<"), lit(">"), lit("foo")).label
#=> 'between(lit("<"), lit(">"), lit("foo"))'

Having labels that resemble the source code is helpful for the error messages.

If we use def instead of define_combinator, then the label would be that of its definition. In the following case, it would be that assigned by <.

def between(left, right, p)
  left > p < right
end

between(lit("<"), lit(">"), lit("foo")).label
#=> '((lit("<") > lit("foo")) < lit(">"))'

If we're to wrap that parser in a new one, then the label would be simply unknown.

def between(left, right, p)
  Parsby.new {|c| (left > p < right).parse c }
end

between(lit("<"), lit(">"), lit("foo")).label.to_s
#=> "unknown"

Defining parsers as modules

The typical pattern I use is something like this:

module FoobarParser
  include Parsby::Combinators
  extend self

  # Entrypoint nicety
  def parse(s)
    foobar.parse s
  end

  define_combinator :foobar do
    foo + bar
  end

  define_combinator :foo do
    lit("foo")
  end

  define_combinator :bar do
    lit("bar")
  end
end

From that, you can use it directly as:

FoobarParser.parse "foobar"
#=> "foobar"
FoobarParser.foo.parse "foo"
#=> "foo"

Being able to use subparsers directly is useful for when you want to e.g. parse a JSON array, instead of any JSON value.

Writing the parser as a module like that also makes it easy to make a new parser based on it:

module FoobarbazParser
  include FoobarParser
  extend self

  def parse(s)
    foobarbaz.parse s
  end

  define_combinator :foobarbaz do
    foobar + baz
  end

  define_combinator :baz do
    lit("baz")
  end
end

You can also define such a module to hold your own project's combinators to use in multiple parsers.

ExpectationFailed

Here's an example of an error, when parsing fails:

pry(main)> Parsby::Example::LispParser.sexp.parse "(foo `(foo ,bar) 2.3 . . nil)"    
Parsby::ExpectationFailed: line 1:
  (foo `(foo ,bar) 2.3 . . nil)
                         |           * failure: char_in("([")
                         |           * failure: list
                         |          *| failure: symbol
                         |         *|| failure: nil
                         |        *||| failure: string
                         |       *|||| failure: number
                                 \\\||
                         |          *| failure: atom
                         |         *|| failure: abbrev
                                   \\|
                         |           * failure: sexp
                       V            *| success: lit(".")
                   \-/             *|| success: sexp
       \---------/                *||| success: sexp
   \-/                           *|||| success: sexp
  V                             *||||| success: char_in("([")
                                \\\\\|
  |                                  * failure: list
  |                                  * failure: sexp

As can be seen, Parsby manages a tree structure representing parsers and their subparsers, with the information of where a particular parser began parsing, where it ended, whether it succeeded or failed, and the label of the parser.

It might be worth mentioning that when debugging a parser from an unexpected ExpectationFailed error, the backtrace isn't really useful. That's because the backtrace points to the code involved in parsing, not the code involved in constructing the parsers, which succeeded, but is where the problem typically lies. The tree-looking exception message above is meant to somewhat substitute the utility of the backtrace in these cases.

Relating to that, the right-most text are the labels of the corresponding parsers. I find that labels that resemble the source code are quite useful, just like the code location descriptions that appear right-most in backtraces. It's because of this that I consider the use of define_combinator more preferable than using def and explicitly assigning labels.

Cleaning up the parse tree for the trace

If you look at the source of the example lisp parser, you might note that there are a lot more parsers in between those shown in the tree above. sexp is not a direct child of list, for example, despite it appearing as so. There are at least 6 ancestors/descendant parsers between list and `se

Related Skills

View on GitHub
GitHub Stars103
CategoryDevelopment
Updated6mo ago
Forks2

Languages

Ruby

Security Score

87/100

Audited on Sep 5, 2025

No findings