SkillAgentSearch skills...

Datatype99

Algebraic data types for C99

Install / Use

/learn @hirrolot/Datatype99

README

<div align="center"> <a href="examples/binary_tree.c"><img src="images/preview.png" width="600px" /></a> <h1>Datatype99</h1> <a href="https://github.com/hirrolot/datatype99/actions"> <img src="https://github.com/hirrolot/datatype99/workflows/C/C++%20CI/badge.svg"> </a>

Safe, intuitive algebraic data types with exhaustive pattern matching & compile-time introspection facilities. No external tools required, pure C99.

</div>

Highlights

  • Type-safe. Such things as improperly typed variants, non-exhaustive pattern matching, and invalid field access are caught at compile-time.

  • Portable. Everything you need is a standard-conforming C99 compiler; neither the standard library, nor compiler/platform-specific functionality or VLA are required.

  • Predictable. Datatype99 comes with formal [code generation semantics], meaning that the generated data layout is guaranteed to always be the same.

  • Comprehensible errors. Datatype99 is resilient to bad code.

  • Battle-tested. Datatype99 is used at OpenIPC to develop real-time streaming software for IP cameras; this includes an RTSP 1.0 implementation along with ~50k lines of private code.

Installation

Datatype99 consists of one header file datatype99.h and one dependency Metalang99. To use it in your project, you need to:

  1. Add datatype99 and metalang99/include to your include directories.
  2. Specify -ftrack-macro-expansion=0 (GCC) or -fmacro-backtrace-limit=1 (Clang) to avoid useless macro expansion errors.

If you use CMake, the recommended way is FetchContent:

include(FetchContent)

FetchContent_Declare(
    datatype99
    URL https://github.com/hirrolot/datatype99/archive/refs/tags/vx.y.z.tar.gz # vx.y.z
)

FetchContent_MakeAvailable(datatype99)

target_link_libraries(MyProject datatype99)

# Disable full macro expansion backtraces for Metalang99.
if(CMAKE_C_COMPILER_ID STREQUAL "Clang")
  target_compile_options(MyProject PRIVATE -fmacro-backtrace-limit=1)
elseif(CMAKE_C_COMPILER_ID STREQUAL "GNU")
  target_compile_options(MyProject PRIVATE -ftrack-macro-expansion=0)
endif()

(By default, datatype99/CMakeLists.txt downloads Metalang99 v1.13.5 from the GitHub releases; if you want to override this behaviour, you can do so by invoking FetchContent_Declare earlier.)

Optionally, you can precompile headers in your project that rely on Datatype99. This will decrease compilation time, because the headers will not be compiled each time they are included.

Happy hacking!

Usage

Put simply, Datatype99 is just a syntax sugar over tagged unions; the only difference is that it is more safe and concise. For example, to represent a binary tree, you would normally write something like this:

typedef struct {
    struct BinaryTree *lhs;
    int x;
    struct BinaryTree *rhs;
} BinaryTreeNode;

typedef struct {
    enum { Leaf, Node } tag;
    union {
        int leaf;
        BinaryTreeNode node;
    } data;
} BinaryTree;

To avoid this boilerplate, you can use Datatype99:

datatype(
    BinaryTree,
    (Leaf, int),
    (Node, BinaryTree *, int, BinaryTree *)
);

Say you want to sum all nodes and leafs in your binary tree. Then you may write something like this:

int sum(const BinaryTree *tree) {
    switch (tree->tag) {
    case Leaf:
        return tree->data.leaf;
    case Node:
        return sum(tree->data.node.lhs) + tree->data.node.x + sum(tree->data.node.rhs);
    }

    // Invalid input (no such variant).
    return -1;
}

... but what if you accidentally access tree->data.node after case Leaf:? Your compiler would not warn you, thus resulting in a business logic bug.

With Datatype99, you can rewrite sum as follows, using a technique called pattern matching:

int sum(const BinaryTree *tree) {
    match(*tree) {
        of(Leaf, x) return *x;
        of(Node, lhs, x, rhs) return sum(*lhs) + *x + sum(*rhs);
    }

    // Invalid input (no such variant).
    return -1;
}

of gives you variables called bindings: x, lhs, or rhs. This design has a few neat aspects:

  • Compile-time safety. The bindings of Node are invisible after of(Leaf, x) and vice versa, so compilation will fail to proceed if you access them inappropriately.
  • Flexibility. Bindings have pointer types so that you can mutate them, thereby mutating the whole tree; in order to obtain a value, you can dereference them, as shown in the example: return *x;.

The last thing unmentioned is how you construct variants. Internally, Datatype99 generates inline static functions called value constructors; you can use them as follows:

BinaryTree leaf5 = Leaf(5);
BinaryTree leaf7 = Leaf(7);
BinaryTree node = Node(&leaf5, 123, &leaf7);

Finally, just a few brief notes about pattern matching:

  • To match the default case, write otherwise { ... } at the end of match.
  • To ignore a binding, write _: of(Foo, a, b, _, d).
  • Please, do not use top-level break/continue inside statements provided to of and ifLet; use goto labels instead.

Congratulations, this is all you need to know to write most of the stuff! If you feel fancy, you can also introspect your types at compile-time; see examples/derive/ for the examples.

Syntax and semantics

Having a well-defined semantics of the macros, you can write an FFI which is quite common in C.

EBNF syntax

<datatype>      ::= "datatype(" [ <derive-clause> "," ] <datatype-name> { "," <variant> }+ ")" ;
<record>        ::= "record("   [ <derive-clause> "," ] <record-name>   { "," <field>   }* ")" ;
<datatype-name> ::= <ident> ;
<record-name>   ::= <ident> ;

<variant>       ::= "(" <variant-name> { "," <type> }* ")" ;
<field>         ::= "(" <type> "," <field-name> ")" ;
<variant-name>  ::= <ident> ;
<field-name>    ::= <ident> ;

<derive-clause> ::= "derive(" <deriver-name> { "," <deriver-name> }* ")" ;
<deriver-name>  ::= <ident> ;

<match>         ::= "match(" <lvalue> ") {" { <of> }* [ <otherwise> ] "}" ;
<matches>       ::= "MATCHES(" <expr> "," <ident> ")" ;
<if-let>        ::= "ifLet(" <lvalue> "," <variant-name> "," <ident> { "," <ident> }* ")" <stmt> ;
<of>            ::= "of(" <variant-name> { "," <ident> }* ")" <stmt> ;
<otherwise>     ::= "otherwise" <stmt> ;
<details> <summary>Note: shortened vs. postfixed versions</summary>

Each listed identifier in the above grammar corresponds to a macro name defined by default -- these are called shortened versions. On the other hand, there are also postfixed versions (match99, of99, derive99, etc.), which are defined unconditionally. If you want to avoid name clashes caused by shortened versions, define DATATYPE99_NO_ALIASES before including datatype99.h. Library headers are strongly advised to use the postfixed macros, but without resorting to DATATYPE99_NO_ALIASES.

</details>

Semantics

(It might be helpful to look at the generated data layout of examples/binary_tree.c.)

datatype

  1. Before everything, the following type definition is generated:
typedef struct <datatype-name> <datatype-name>;
  1. For each non-empty variant, the following type definition is generated (the metavariable <type> ranges over a corresponding variant's types):
typedef struct <datatype-name><variant-name> {
    <type>0 _0;
    ...
    <type>N _N;
} <datatype-name><variant-name>;
  1. For each non-empty variant, the following type definitions to types of each field of <datatype-name><variant-name> are generated:
typedef <type>0 <variant-name>_0;
...
typedef <type>N <variant-name>_N;
  1. For each variant, the following type definition to a corresponding sum type is generated:
typedef struct <datatype-name> <variant-name>SumT;
  1. For each sum type, the following tagged union is generated (inside the union, only fields to structures of non-empty variants are generated):
typedef enum <datatype-name>Tag {
    <variant-name>0Tag, ..., <variant-name>NTag
} <datatype-name>Tag;

typedef union <datatype-name>Variants {
    char dummy;

    <datatype-name><variant-name>0 <variant-name>0;
    ...
    <datatype-name><variant-name>N <variant-name>N;
} <datatype-name>Variants;

struct <datatype-name> {
    <datatype-name>Tag tag;
    <datatype-name>Variants data;
};
<details> <summary>Note on char dummy;</summary>

char dummy; is needed to make the union contain at least one item, according to the standard, even if all variants are empty. Such a datatype would enforce strict type checking unlike plain C enums.

</details>
  1. For each variant, the following function called a value constructor is generated:
inline static <datatype-name> <variant-name>(/* ... */) { /* ... */ }

If the variant has no parameters, this function will take void and initialise .data.dummy to '\0'; otherwise, it will take the corresponding variant parameters and init

View on GitHub
GitHub Stars1.5k
CategoryProduct
Updated13h ago
Forks27

Languages

C

Security Score

100/100

Audited on Mar 27, 2026

No findings