SkillAgentSearch skills...

Mpc

A Parser Combinator library for C

Install / Use

/learn @orangeduck/Mpc
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

Micro Parser Combinators

Version 0.9.0

About

mpc is a lightweight and powerful Parser Combinator library for C.

Using mpc might be of interest to you if you are...

  • Building a new programming language
  • Building a new data format
  • Parsing an existing programming language
  • Parsing an existing data format
  • Embedding a Domain Specific Language
  • Implementing Greenspun's Tenth Rule

Features

  • Type-Generic
  • Predictive, Recursive Descent
  • Easy to Integrate (One Source File in ANSI C)
  • Automatic Error Message Generation
  • Regular Expression Parser Generator
  • Language/Grammar Parser Generator

Alternatives

The current main alternative for a C based parser combinator library is a branch of Cesium3.

mpc provides a number of features that this project does not offer, and also overcomes a number of potential downsides:

  • mpc Works for Generic Types
  • mpc Doesn't rely on Boehm-Demers-Weiser Garbage Collection
  • mpc Doesn't use setjmp and longjmp for errors
  • mpc Doesn't pollute the namespace

Quickstart

Here is how one would use mpc to create a parser for a basic mathematical expression language.

mpc_parser_t *Expr  = mpc_new("expression");
mpc_parser_t *Prod  = mpc_new("product");
mpc_parser_t *Value = mpc_new("value");
mpc_parser_t *Maths = mpc_new("maths");

mpca_lang(MPCA_LANG_DEFAULT,
  " expression : <product> (('+' | '-') <product>)*; "
  " product    : <value>   (('*' | '/')   <value>)*; "
  " value      : /[0-9]+/ | '(' <expression> ')';    "
  " maths      : /^/ <expression> /$/;               ",
  Expr, Prod, Value, Maths, NULL);

mpc_result_t r;

if (mpc_parse("input", input, Maths, &r)) {
  mpc_ast_print(r.output);
  mpc_ast_delete(r.output);
} else {
  mpc_err_print(r.error);
  mpc_err_delete(r.error);
}

mpc_cleanup(4, Expr, Prod, Value, Maths);

If you were to set input to the string (4 * 2 * 11 + 2) - 5, the printed output would look like this.

>
  regex
  expression|>
    value|>
      char:1:1 '('
      expression|>
        product|>
          value|regex:1:2 '4'
          char:1:4 '*'
          value|regex:1:6 '2'
          char:1:8 '*'
          value|regex:1:10 '11'
        char:1:13 '+'
        product|value|regex:1:15 '2'
      char:1:16 ')'
    char:1:18 '-'
    product|value|regex:1:20 '5'
  regex

Getting Started

Introduction

Parser Combinators are structures that encode how to parse particular languages. They can be combined using intuitive operators to create new parsers of increasing complexity. Using these operators detailed grammars and languages can be parsed and processed in a quick, efficient, and easy way.

The trick behind Parser Combinators is the observation that by structuring the library in a particular way, one can make building parser combinators look like writing a grammar itself. Therefore instead of describing how to parse a language, a user must only specify the language itself, and the library will work out how to parse it ... as if by magic!

mpc can be used in this mode, or, as shown in the above example, you can specify the grammar directly as a string or in a file.

Basic Parsers

String Parsers

All the following functions construct new basic parsers of the type mpc_parser_t *. All of those parsers return a newly allocated char * with the character(s) they manage to match. If unsuccessful they will return an error. They have the following functionality.


mpc_parser_t *mpc_any(void);

Matches any individual character


mpc_parser_t *mpc_char(char c);

Matches a single given character c


mpc_parser_t *mpc_range(char s, char e);

Matches any single given character in the range s to e (inclusive)


mpc_parser_t *mpc_oneof(const char *s);

Matches any single given character in the string s


mpc_parser_t *mpc_noneof(const char *s);

Matches any single given character not in the string s


mpc_parser_t *mpc_satisfy(int(*f)(char));

Matches any single given character satisfying function f


mpc_parser_t *mpc_string(const char *s);

Matches exactly the string s

Other Parsers

Several other functions exist that construct parsers with some other special functionality.


mpc_parser_t *mpc_pass(void);

Consumes no input, always successful, returns NULL


mpc_parser_t *mpc_fail(const char *m);
mpc_parser_t *mpc_failf(const char *fmt, ...);

Consumes no input, always fails with message m or formatted string fmt.


mpc_parser_t *mpc_lift(mpc_ctor_t f);

Consumes no input, always successful, returns the result of function f


mpc_parser_t *mpc_lift_val(mpc_val_t *x);

Consumes no input, always successful, returns x


mpc_parser_t *mpc_state(void);

Consumes no input, always successful, returns a copy of the parser state as a mpc_state_t *. This state is newly allocated and so needs to be released with free when finished with.


mpc_parser_t *mpc_anchor(int(*f)(char,char));

Consumes no input. Successful when function f returns true. Always returns NULL.

Function f is a anchor function. It takes as input the last character parsed, and the next character in the input, and returns success or failure. This function can be set by the user to ensure some condition is met. For example to test that the input is at a boundary between words and non-words.

At the start of the input the first argument is set to '\0'. At the end of the input the second argument is set to '\0'.

Parsing

Once you've build a parser, you can run it on some input using one of the following functions. These functions return 1 on success and 0 on failure. They output either the result, or an error to a mpc_result_t variable. This type is defined as follows.

typedef union {
  mpc_err_t *error;
  mpc_val_t *output;
} mpc_result_t;

where mpc_val_t * is synonymous with void * and simply represents some pointer to data - the exact type of which is dependant on the parser.


int mpc_parse(const char *filename, const char *string, mpc_parser_t *p, mpc_result_t *r);

Run a parser on some string.


int mpc_parse_file(const char *filename, FILE *file, mpc_parser_t *p, mpc_result_t *r);

Run a parser on some file.


int mpc_parse_pipe(const char *filename, FILE *pipe, mpc_parser_t *p, mpc_result_t *r);

Run a parser on some pipe (such as stdin).


int mpc_parse_contents(const char *filename, mpc_parser_t *p, mpc_result_t *r);

Run a parser on the contents of some file.

Combinators

Combinators are functions that take one or more parsers and return a new parser of some given functionality.

These combinators work independently of exactly what data type the parser(s) supplied as input return. In languages such as Haskell ensuring you don't input one type of data into a parser requiring a different type is done by the compiler. But in C we don't have that luxury. So it is at the discretion of the programmer to ensure that he or she deals correctly with the outputs of different parser types.

A second annoyance in C is that of manual memory management. Some parsers might get half-way and then fail. This means they need to clean up any partial result that has been collected in the parse. In Haskell this is handled by the Garbage Collector, but in C these combinators will need to take destructor functions as input, which say how clean up any partial data that has been collected.

Here are the main combinators and how to use then.


mpc_parser_t *mpc_expect(mpc_parser_t *a, const char *e);
mpc_parser_t *mpc_expectf(mpc_parser_t *a, const char *fmt, ...);

Returns a parser that runs a, and on success returns the result of a, while on failure reports that e was expected.


mpc_parser_t *mpc_apply(mpc_parser_t *a, mpc_apply_t f);
mpc_parser_t *mpc_apply_to(mpc_parser_t *a, mpc_apply_to_t f, void *x);

Returns a parser that applies function f (optionality taking extra input x) to the result of parser a.


mpc_parser_t *mpc_check(mpc_parser_t *a, mpc_dtor_t da, mpc_check_t f, const char *e);
mpc_parser_t *mpc_check_with(mpc_parser_t *a, mpc_dtor_t da, mpc_check_with_t f, void *x, const char *e);
mpc_parser_t *mpc_checkf(mpc_parser_t *a, mpc_dtor_t da, mpc_check_t f, const char *fmt, ...);
mpc_parser_t *mpc_check_withf(mpc_parser_t *a, mpc_dtor_t da, mpc_check_with_t f, void *x, const char *fmt, ...);

Returns a parser that applies function f (optionally taking extra input x) to the result of parser a. If f returns non-zero, then the parser succeeds and returns the value of a (possibly modified by f). If f returns zero, then the parser fails with message e, and the result of a is destroyed with the destructor da.


mpc_parser_t *mpc_not(mpc_parser_t *a, mpc_dtor_t da);
mpc_parser_t *mpc_not_lift(mpc_parser_t *a, mpc_dtor_t da, mpc_ctor_t lf);

Returns a parser with the following behaviour. If parser a succeeds, then it fails and consumes no input. If parser a fails, then it succeeds, consumes no input and returns NULL (or the result of lift function lf). Destructor da is used to destroy the result of a on success.


mpc_parser_t *mpc_maybe(mpc_parser_t *a);
mp

Related Skills

View on GitHub
GitHub Stars2.8k
CategoryDevelopment
Updated3d ago
Forks302

Languages

C

Security Score

80/100

Audited on Apr 7, 2026

No findings