SkillAgentSearch skills...

Parsecj

Java monadic parser combinator framework for constructing LL(1) parsers

Install / Use

/learn @jon-hanson/Parsecj
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

ParsecJ

ParsecJ

Introduction

:warning: Note: ParsecJ has been superceded by funcj.parser. The latter uses an applicative framework instead of monads, but is otherwise very similar to ParsecJ.

ParsecJ is a Java monadic parser combinator framework for constructing LL(1) parsers. It is a port of the Haskell Parsec library. The implementation is, where possible, a direct Java port of the Haskell code outlined in the original Parsec paper.

Some notable features include:

  • Composable parser combinators, which provide a DSL for implementing parsers from grammars.
  • Informative error messages in the event of parse failures.
  • Thread-safe due to immutable parsers and inputs.
  • A combinator approach that mirrors that of Parsec, its Haskell counterpart, allowing grammars written for Parsec to be translated into equivalent ParsecJ grammars.
  • Lightweight library (the Jar file size is less than 50Kb) with zero dependencies (aside from JUnit and JMH for the tests).

Parser Combinators

A typical approach to implementing parsers for special-purpose languages is to use a parser generation tool, such as Yacc/Bison or ANTLR. With these tools the language is expressed as a series of production rules, described using a grammar language specific to the tool. The parsing code for the language is then generated from the grammar definition.

An alternative approach is to implement a recursive descent parser, whereby the production rules comprising the grammar are translated by hand into parse functions. The advantage here is that the rules are expressed in the host programming language, obviating the need for a separate grammar language and the consequent code-generation phase. A limitation of this approach is that the extra plumbing required to implement error-handling and backtracking obscures the relationship between the parsing functions and the language rules

Monadic parser combinators are an extension of recursive descent parsing, which use a monad to encapsulate the plumbing. The framework provides the basic building blocks - parsers for constituent language elements such as characters, words and numbers. It also provides combinators that allow more complex parsers to be constructed by composing existing parsers. The framework effectively provides a Domain Specific Language for expressing language grammars, whereby each grammar instance implements an executable parser.

Getting Started

Requirements

ParsecJ requires Java 1.8 (or higher).

Resources

Maven

Add this dependency to your project pom.xml:

<dependency>
    <groupId>org.javafp</groupId>
    <artifactId>parsecj</artifactId>
    <version>0.6</version>
</dependency>

Example

As a quick illustration of implementing a parser using ParsecJ, consider a simple expression language for expressions of the form x+y, where x and y are integers.

The grammar for the language consists of a single production rule:

sum ::= integer '+' integer

This can be translated into the following ParsecJ parser:

import org.javafp.parsecj.*;
import static org.javafp.parsecj.Combinators.*;
import static org.javafp.parsecj.Text.*;

class Test {
   public static void main(String[] args) throws Exception {
        Parser<Character, Integer> sum =
            intr.bind(x ->                  // parse an integer and bind the result to the variable x.
                chr('+').then(              // parse a '+' sign, and throw away the result.
                    intr.bind(y ->          // parse an integer and bind the result to the variable y.
                        retn(x+y))));       // return the sum of x and y.
    }
}

The parser is constructed by taking the intr parser for integers, the chr parser for single characters, and combining them using the bind, then and retn combinators.

The parser can be used as follows:

int i = sum.parse(Input.of("1+2")).getResult();
assert i == 3;

Meanwhile, if we give it invalid input:

int i2 = sum.parse(Input.of("1+z")).getResult();

then it throws an exception with an error message that pinpoints the problem:

Exception in thread "main" java.lang.Exception: Message{position=2, sym=<z>, expected=[integer]}

General Approach

A typical approach to using the library to implement a parser for a language is as follows:

  1. Define a model for the language, i.e. a set of classes that represent the language elements.
  2. Define a grammar for the language - a set of production rules.
  3. Translate the production rules into parsers using the library combinators. The parsers will typically construct values from the model.
  4. Book-end the parser for the top-level element with the eof combinator.
  5. Invoke the parser by passing a Input object, usually constructed from a String, to the parse method.
  6. The resultant Reply result holds either the successfully parsed value or an error message.

Types

There are three principal types to be aware of.

Parser

All parsers implement the Parser (functional) interface, which has an apply method:

@FunctionalInterface
public interface Parser<I, A> {
    ConsumedT<I, A> apply(Input<I> input);

    default Reply<I, A> parse(Input<I> input) {
        return apply(input).getReply();
    }
    // ...
}

I.e. a Parser<I, A> is essentially a function from a Input<I> to a ConsumedT<I, A>, where I is the input stream symbol type (usually Character), and A is the type of the value being parsed. For example, a parser that operates on character input and parses an integer would have type Parser<Character, Integer>.

The apply method contains the main machinery of the parser, and combinators use this method to compose parsers. However, since the ConsumedT type returned by apply is an intermediate type, the parse method is also provided to apply the parser and extract the Reply parse result.

Input

The Input interface is an abstraction representing an immutable input state. It provides several static of methods for constructing Input instances from sequences of symbols:

public interface Input<I> {
    static <I> Input<I> of(I[] symbols) {
        return new ArrayInput<I>(symbols);
    }

    static Input<Character> of(Character[] symbols) {
        return new CharArrayInput(symbols);
    }

    static Input<Character> of(String symbols) {
        return new StringInput(symbols);
    }

    // ...
}

Reply

The ConsumedT object returned by Parser.apply is an intermediate result wrapper, typically only of interest to combinator implementations. The ConsumedT.getReply method returns the parser result wrapper, alternatively the Parser.parse method can be used to bypass ConsumedT entirely.

Reply<T> reply = p.apply(input).getReply();
// is equivalent to:
Reply<T> reply2 = p.parse(input);

assert(reply.equals(reply2));

A Reply can be either a successful parse result (represented by the Ok subtype) or an error (represented by the Error subtype).

public abstract class Reply<I, A> {
    public abstract <B> B match(Function<Ok<I, A>, B> ok, Function<Error<I, A>, B> error);

    public abstract A getResult() throws Exception;

    public abstract boolean isOk();
    
    public abstract boolean isError();
}

The isOk and isError methods can be used to test the type. Alternatively, use the match method to handle both cases, e.g.:

String msg =
    parser.parse(input)
        .match(
            ok -> "Result : " + ok.getResult(),
            error -> "Error : " + error.getMsg()
        );

A third option is to use the getResult method which either returns the successfully parsed result, if the reply is an Ok, or throws an exception if it's an Error.

// May throw.
Parser<Character, MyResult> p = ...
MyResult res = parser.parse(input).getResult();

Defining Parsers

A parser for a language is defined by translating the production rules comprising the language grammar into parsers, using the combinators provided by the library.

Combinators

Combinators create new parsers by composing existing ones. The Combinators package provides the following core combinator parsers:

| Name | Parser Description |-----|------------- retn(value) | Always succeeds. bind(p, f) | First applies the parser p. If it

View on GitHub
GitHub Stars123
CategoryDevelopment
Updated6d ago
Forks6

Languages

Java

Security Score

85/100

Audited on Mar 22, 2026

No findings