Mpc
A Parser Combinator library for C
Install / Use
/learn @orangeduck/MpcREADME
Micro Parser Combinators
Version 0.9.0
About
mpc is a lightweight and powerful Parser Combinator library for C.
Using mpc might be of interest to you if you are...
- Building a new programming language
- Building a new data format
- Parsing an existing programming language
- Parsing an existing data format
- Embedding a Domain Specific Language
- Implementing Greenspun's Tenth Rule
Features
- Type-Generic
- Predictive, Recursive Descent
- Easy to Integrate (One Source File in ANSI C)
- Automatic Error Message Generation
- Regular Expression Parser Generator
- Language/Grammar Parser Generator
Alternatives
The current main alternative for a C based parser combinator library is a branch of Cesium3.
mpc provides a number of features that this project does not offer, and also overcomes a number of potential downsides:
- mpc Works for Generic Types
- mpc Doesn't rely on Boehm-Demers-Weiser Garbage Collection
- mpc Doesn't use
setjmpandlongjmpfor errors - mpc Doesn't pollute the namespace
Quickstart
Here is how one would use mpc to create a parser for a basic mathematical expression language.
mpc_parser_t *Expr = mpc_new("expression");
mpc_parser_t *Prod = mpc_new("product");
mpc_parser_t *Value = mpc_new("value");
mpc_parser_t *Maths = mpc_new("maths");
mpca_lang(MPCA_LANG_DEFAULT,
" expression : <product> (('+' | '-') <product>)*; "
" product : <value> (('*' | '/') <value>)*; "
" value : /[0-9]+/ | '(' <expression> ')'; "
" maths : /^/ <expression> /$/; ",
Expr, Prod, Value, Maths, NULL);
mpc_result_t r;
if (mpc_parse("input", input, Maths, &r)) {
mpc_ast_print(r.output);
mpc_ast_delete(r.output);
} else {
mpc_err_print(r.error);
mpc_err_delete(r.error);
}
mpc_cleanup(4, Expr, Prod, Value, Maths);
If you were to set input to the string (4 * 2 * 11 + 2) - 5, the printed output would look like this.
>
regex
expression|>
value|>
char:1:1 '('
expression|>
product|>
value|regex:1:2 '4'
char:1:4 '*'
value|regex:1:6 '2'
char:1:8 '*'
value|regex:1:10 '11'
char:1:13 '+'
product|value|regex:1:15 '2'
char:1:16 ')'
char:1:18 '-'
product|value|regex:1:20 '5'
regex
Getting Started
Introduction
Parser Combinators are structures that encode how to parse particular languages. They can be combined using intuitive operators to create new parsers of increasing complexity. Using these operators detailed grammars and languages can be parsed and processed in a quick, efficient, and easy way.
The trick behind Parser Combinators is the observation that by structuring the library in a particular way, one can make building parser combinators look like writing a grammar itself. Therefore instead of describing how to parse a language, a user must only specify the language itself, and the library will work out how to parse it ... as if by magic!
mpc can be used in this mode, or, as shown in the above example, you can specify the grammar directly as a string or in a file.
Basic Parsers
String Parsers
All the following functions construct new basic parsers of the type mpc_parser_t *. All of those parsers return a newly allocated char * with the character(s) they manage to match. If unsuccessful they will return an error. They have the following functionality.
mpc_parser_t *mpc_any(void);
Matches any individual character
mpc_parser_t *mpc_char(char c);
Matches a single given character c
mpc_parser_t *mpc_range(char s, char e);
Matches any single given character in the range s to e (inclusive)
mpc_parser_t *mpc_oneof(const char *s);
Matches any single given character in the string s
mpc_parser_t *mpc_noneof(const char *s);
Matches any single given character not in the string s
mpc_parser_t *mpc_satisfy(int(*f)(char));
Matches any single given character satisfying function f
mpc_parser_t *mpc_string(const char *s);
Matches exactly the string s
Other Parsers
Several other functions exist that construct parsers with some other special functionality.
mpc_parser_t *mpc_pass(void);
Consumes no input, always successful, returns NULL
mpc_parser_t *mpc_fail(const char *m);
mpc_parser_t *mpc_failf(const char *fmt, ...);
Consumes no input, always fails with message m or formatted string fmt.
mpc_parser_t *mpc_lift(mpc_ctor_t f);
Consumes no input, always successful, returns the result of function f
mpc_parser_t *mpc_lift_val(mpc_val_t *x);
Consumes no input, always successful, returns x
mpc_parser_t *mpc_state(void);
Consumes no input, always successful, returns a copy of the parser state as a mpc_state_t *. This state is newly allocated and so needs to be released with free when finished with.
mpc_parser_t *mpc_anchor(int(*f)(char,char));
Consumes no input. Successful when function f returns true. Always returns NULL.
Function f is a anchor function. It takes as input the last character parsed, and the next character in the input, and returns success or failure. This function can be set by the user to ensure some condition is met. For example to test that the input is at a boundary between words and non-words.
At the start of the input the first argument is set to '\0'. At the end of the input the second argument is set to '\0'.
Parsing
Once you've build a parser, you can run it on some input using one of the following functions. These functions return 1 on success and 0 on failure. They output either the result, or an error to a mpc_result_t variable. This type is defined as follows.
typedef union {
mpc_err_t *error;
mpc_val_t *output;
} mpc_result_t;
where mpc_val_t * is synonymous with void * and simply represents some pointer to data - the exact type of which is dependant on the parser.
int mpc_parse(const char *filename, const char *string, mpc_parser_t *p, mpc_result_t *r);
Run a parser on some string.
int mpc_parse_file(const char *filename, FILE *file, mpc_parser_t *p, mpc_result_t *r);
Run a parser on some file.
int mpc_parse_pipe(const char *filename, FILE *pipe, mpc_parser_t *p, mpc_result_t *r);
Run a parser on some pipe (such as stdin).
int mpc_parse_contents(const char *filename, mpc_parser_t *p, mpc_result_t *r);
Run a parser on the contents of some file.
Combinators
Combinators are functions that take one or more parsers and return a new parser of some given functionality.
These combinators work independently of exactly what data type the parser(s) supplied as input return. In languages such as Haskell ensuring you don't input one type of data into a parser requiring a different type is done by the compiler. But in C we don't have that luxury. So it is at the discretion of the programmer to ensure that he or she deals correctly with the outputs of different parser types.
A second annoyance in C is that of manual memory management. Some parsers might get half-way and then fail. This means they need to clean up any partial result that has been collected in the parse. In Haskell this is handled by the Garbage Collector, but in C these combinators will need to take destructor functions as input, which say how clean up any partial data that has been collected.
Here are the main combinators and how to use then.
mpc_parser_t *mpc_expect(mpc_parser_t *a, const char *e);
mpc_parser_t *mpc_expectf(mpc_parser_t *a, const char *fmt, ...);
Returns a parser that runs a, and on success returns the result of a, while on failure reports that e was expected.
mpc_parser_t *mpc_apply(mpc_parser_t *a, mpc_apply_t f);
mpc_parser_t *mpc_apply_to(mpc_parser_t *a, mpc_apply_to_t f, void *x);
Returns a parser that applies function f (optionality taking extra input x) to the result of parser a.
mpc_parser_t *mpc_check(mpc_parser_t *a, mpc_dtor_t da, mpc_check_t f, const char *e);
mpc_parser_t *mpc_check_with(mpc_parser_t *a, mpc_dtor_t da, mpc_check_with_t f, void *x, const char *e);
mpc_parser_t *mpc_checkf(mpc_parser_t *a, mpc_dtor_t da, mpc_check_t f, const char *fmt, ...);
mpc_parser_t *mpc_check_withf(mpc_parser_t *a, mpc_dtor_t da, mpc_check_with_t f, void *x, const char *fmt, ...);
Returns a parser that applies function f (optionally taking extra input x) to the result of parser a. If f returns non-zero, then the parser succeeds and returns the value of a (possibly modified by f). If f returns zero, then the parser fails with message e, and the result of a is destroyed with the destructor da.
mpc_parser_t *mpc_not(mpc_parser_t *a, mpc_dtor_t da);
mpc_parser_t *mpc_not_lift(mpc_parser_t *a, mpc_dtor_t da, mpc_ctor_t lf);
Returns a parser with the following behaviour. If parser a succeeds, then it fails and consumes no input. If parser a fails, then it succeeds, consumes no input and returns NULL (or the result of lift function lf). Destructor da is used to destroy the result of a on success.
mpc_parser_t *mpc_maybe(mpc_parser_t *a);
mp
Related Skills
node-connect
354.2kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
112.2kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
354.2kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
354.2kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
