Javacc
JavaCC - a parser generator for building parsers from grammars. It can generate code in Java, C++ and C#.
Install / Use
/learn @javacc/JavaccREADME
JavaCC
Java Compiler Compiler (JavaCC) is the most popular parser generator for use with Java applications.
A parser generator is a tool that reads a grammar specification and converts it to a Java program that can recognize matches to the grammar.
In addition to the parser generator itself, JavaCC provides other standard capabilities related to parser generation such as tree building (via a tool called JJTree included with JavaCC), actions and debugging.
All you need to run a JavaCC parser, once generated, is a Java Runtime Environment (JRE).
This README is meant as a brief overview of the core features and how to set things up to get yourself started with JavaCC. For a fully detailed documentation, please see https://javacc.github.io/javacc/.
Contents
Introduction
Features
-
JavaCC generates top-down (recursive descent) parsers as opposed to bottom-up parsers generated by YACC-like tools. This allows the use of more general grammars, although left-recursion is disallowed. Top-down parsers have a number of other advantages (besides more general grammars) such as being easier to debug, having the ability to parse to any non-terminal in the grammar, and also having the ability to pass values (attributes) both up and down the parse tree during parsing.
-
By default, JavaCC generates an
LL(1)parser. However, there may be portions of grammar that are notLL(1). JavaCC offers the capabilities of syntactic and semantic lookahead to resolve shift-shift ambiguities locally at these points. For example, the parser isLL(k)only at such points, but remainsLL(1)everywhere else for better performance. Shift-reduce and reduce-reduce conflicts are not an issue for top-down parsers. -
JavaCC generates parsers that are 100% pure Java, so there is no runtime dependency on JavaCC and no special porting effort required to run on different machine platforms.
-
JavaCC allows extended BNF specifications - such as
(A)*,(A)+etc - within the lexical and the grammar specifications. Extended BNF relieves the need for left-recursion to some extent. In fact, extended BNF is often easier to read as inA ::= y(x)*versusA ::= Ax|y. -
The lexical specifications (such as regular expressions, strings) and the grammar specifications (the BNF) are both written together in the same file. It makes grammars easier to read since it is possible to use regular expressions inline in the grammar specification, and also easier to maintain.
-
The lexical analyzer of JavaCC can handle full Unicode input, and lexical specifications may also include any Unicode character. This facilitates descriptions of language elements such as Java identifiers that allow certain Unicode characters (that are not ASCII), but not others.
-
JavaCC offers Lex-like lexical state and lexical action capabilities. Specific aspects in JavaCC that are superior to other tools are the first class status it offers concepts such as
TOKEN,MORE,SKIPand state changes. This allows cleaner specifications as well as better error and warning messages from JavaCC. -
Tokens that are defined as special tokens in the lexical specification are ignored during parsing, but these tokens are available for processing by the tools. A useful application of this is in the processing of comments.
-
Lexical specifications can define tokens not to be case-sensitive either at the global level for the entire lexical specification, or on an individual lexical specification basis.
-
JavaCC comes with JJTree, an extremely powerful tree building pre-processor.
-
JavaCC also includes JJDoc, a tool that converts grammar files to documentation files, optionally in HTML.
-
JavaCC offers many options to customize its behavior and the behavior of the generated parsers. Examples of such options are the kinds of Unicode processing to perform on the input stream, the number of tokens of ambiguity checking to perform etc.
-
JavaCC error reporting is among the best in parser generators. JavaCC generated parsers are able to clearly point out the location of parse errors with complete diagnostic information.
-
Using options
DEBUG_PARSER,DEBUG_LOOKAHEAD, andDEBUG_TOKEN_MANAGER, users can get in-depth analysis of the parsing and the token processing steps. -
The JavaCC release includes a wide range of examples including Java and HTML grammars. The examples, along with their documentation, are a great way to get acquainted with JavaCC.
An example
The following JavaCC grammar example recognizes matching braces followed by zero or more line terminators and then an end of file.
Examples of legal strings in this grammar are:
{}, {% raw %}{{{{{}}}}}{% endraw %} // ... etc
Examples of illegal strings are:
{}{}, }{}}, { }, {x} // ... etc
Its grammar
PARSER_BEGIN(Example)
/** Simple brace matcher. */
public class Example {
/** Main entry point. */
public static void main(String args[]) throws ParseException {
Example parser = new Example(System.in);
parser.Input();
}
}
PARSER_END(Example)
/** Root production. */
void Input() :
{}
{
MatchedBraces() ("\n"|"\r")* <EOF>
}
/** Brace matching production. */
void MatchedBraces() :
{}
{
"{" [ MatchedBraces() ] "}"
}
Some executions and outputs
{{}} gives no error
$ java Example
{{}}<return>
{x gives a Lexical error
$ java Example
{x<return>
Lexical error at line 1, column 2. Encountered: "x"
TokenMgrError: Lexical error at line 1, column 2. Encountered: "x" (120), after : ""
at ExampleTokenManager.getNextToken(ExampleTokenManager.java:146)
at Example.getToken(Example.java:140)
at Example.MatchedBraces(Example.java:51)
at Example.Input(Example.java:10)
at Example.main(Example.java:6)
{}} gives a ParseException
$ java Example
{}}<return>
ParseException: Encountered "}" at line 1, column 3.
Was expecting one of:
<EOF>
"\n" ...
"\r" ...
at Example.generateParseException(Example.java:184)
at Example.jj_consume_token(Example.java:126)
at Example.Input(Example.java:32)
at Example.main(Example.java:6)
Versions
The RECOMMENDED version is version 8: it separates the parser (the core) from the generators (for the different languages); development and maintenance effort will be mainly on this version.
This version lies on different Git repositories / java & maven projects / jars:
The previous versions (4, 5, 6, 7) are widely spread; effort to migrate to version 8 should be minimum.
Their last version lies on a single Git repository / java & maven project / jar:
Differences between v8 versus v7: very small at the grammar level, more important at the generated sources level:
- the javacc/jjtree grammar part is the same
- most of javacc/jjtree options should be the same, but some may be removed and others appear in v8
- the java grammar part should be nearly the same (may be some java 7 & java 8 features will appear in v8 and not in v7); in the future java 11..17..21.. features would appear only in v8)
- the C++ / C# grammar parts may be somewhat different
- some generated files are not much different, others are
If you read this README.md, you should be under the v7 code.
Getting Started
You can use JavaCC either from the command line or through an IDE.
Use JavaCC from the command line
Download
Download the latest stable release (at least the binaries and the sources) in a so called download directory:
Version 8
Download the core and the generator(s) you are going to use:
-
JavaCC Core 8.0.1 - Binaries, Source (zip), Source (tar.gz), Javadocs
-
JavaCC C++ 8.0.1 - Binaries, Source (zip), Source (tar.gz), [Javadocs](https://repo1.maven.org/maven2/org/j
Related Skills
node-connect
338.7kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
83.6kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
338.7kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
commit-push-pr
83.6kCommit, push, and open a PR
