Edn.c
A fast, zero-copy EDN (Extensible Data Notation) reader written in C11 with SIMD acceleration.
Install / Use
/learn @DotFox/Edn.cREADME
EDN.C
A fast, zero-copy EDN (Extensible Data Notation) reader written in C11 with SIMD acceleration.
TL;DR - What is EDN?
EDN (Extensible Data Notation) is a data format similar to JSON, but richer and more extensible. Think of it as "JSON with superpowers":
- JSON-like foundation: Maps
{:key value}, vectors[1 2 3], strings, numbers, booleans, null (nil) - Additional built-in types: Sets
#{:a :b}, keywords:keyword, symbolsmy-symbol, characters\newline, lists(1 2 3) - Extensible via tagged literals:
#inst "2024-01-01",#uuid "..."—transform data at parse time with custom readers - Human-friendly: Comments, flexible whitespace, designed to be readable and writable by both humans and programs
- Language-agnostic: Originally from Clojure, but useful anywhere you need rich, extensible data interchange
Why EDN over JSON? More expressive types (keywords, symbols, sets), native extensibility through tags (no more {"__type": "Date", "value": "..."} hacks), and better support for configuration files and data interchange in functional programming environments.
Learn more: Official EDN specification
Features
- 🚀 Fast: SIMD-accelerated parsing with NEON (ARM64), SSE4.2 (x86_64) and SIMD128 (WebAssembly) support
- 🌐 WebAssembly: Full WASM SIMD128 support for high-performance parsing in browsers and Node.js
- 💾 Zero-copy: Minimal allocations, references input data where possible
- 🎯 Simple API: Easy-to-use interface with comprehensive type support
- 🧹 Memory-safe: Arena allocator for efficient cleanup - single
edn_free()call - 🔧 Zero Dependencies: Pure C11 with standard library only
- ✅ Fully Tested: 340+ tests across 24 test suites
- 📖 UTF-8 Native: All string inputs and outputs are UTF-8 encoded
- 🏷️ Tagged Literals: Extensible data types with custom reader support
- 🗺️ Map Namespace Syntax: Clojure-compatible
#:ns{...}syntax (optional, disabled by default) - 🔤 Extended Characters:
\formfeed,\backspace, and octal\oNNNliterals (optional, disabled by default) - 📝 Metadata: Clojure-style metadata
^{...}syntax (optional, disabled by default) - 📄 Text Blocks: Java-style multi-line text blocks
"""\n...\n"""(experimental, disabled by default) - 🔢 Ratio Numbers: Clojure-compatible ratio literals
22/7(optional, disabled by default) - 🔣 Extended Integers: Hex (
0xFF), octal (0777), binary (2r1010), and arbitrary radix (36rZZ) formats (optional, disabled by default) - 🔢 Underscore in Numeric Literals: Visual grouping with underscores
1_000_000,3.14_15_92,0xDE_AD_BE_EF(optional, disabled by default)
Table of Contents
- Installation
- Quick Start
- Whitespace and Control Characters
- API Reference
- Examples
- Building
- Performance
- Contributing
- License
Installation
Requirements
- C11 compatible compiler (GCC 4.9+, Clang 3.1+, MSVC 2015+)
- Make (Unix/macOS) or CMake (Windows/cross-platform)
- Supported platforms:
- macOS (Apple Silicon M1/M2/M3, Intel) - NEON/SSE4.2 SIMD
- Linux (ARM64, x86_64) - NEON/SSE4.2 SIMD
- Windows (x86_64, ARM64) - NEON/SSE4.2 SIMD via MSVC/MinGW/Clang
- WebAssembly - SIMD128 support for browsers and Node.js
Build Library
Unix/macOS/Linux:
# Clone the repository
git clone https://github.com/DotFox/edn.c.git
cd edn.c
# Build static library (libedn.a)
make
# Run tests to verify build
make test
Windows:
# Clone the repository
git clone https://github.com/DotFox/edn.c.git
cd edn.c
# Build with CMake (works with MSVC, MinGW, Clang)
.\build.bat
# Or use PowerShell script
.\build.ps1 -Test
See docs/WINDOWS.md for detailed Windows build instructions.
Integrate Into Your Project
Option 1: Link static library
# Compile your code
gcc -o myapp myapp.c -I/path/to/edn.c/include -L/path/to/edn.c -ledn
# Or add to your Makefile
CFLAGS += -I/path/to/edn.c/include
LDFLAGS += -L/path/to/edn.c -ledn
Option 2: Include source directly
Copy include/edn.h and all files from src/ into your project and compile them together.
Quick Start
#include "edn.h"
#include <stdio.h>
int main(void) {
const char *input = "{:name \"Alice\" :age 30 :languages [:clojure :rust]}";
// Read EDN string
edn_result_t result = edn_read(input, 0);
if (result.error != EDN_OK) {
fprintf(stderr, "Parse error at line %zu, column %zu: %s\n",
result.error_start.line, result.error_start.column, result.error_message);
return 1;
}
// Access the parsed map
edn_value_t *map = result.value;
printf("Parsed map with %zu entries\n", edn_map_count(map));
// Look up a value by key
edn_result_t key_result = edn_read(":name", 0);
edn_value_t *name_value = edn_map_lookup(map, key_result.value);
if (name_value != NULL && edn_type(name_value) == EDN_TYPE_STRING) {
size_t len;
const char *name = edn_string_get(name_value, &len);
printf("Name: %.*s\n", (int)len, name);
}
// Clean up - frees all allocated memory
edn_free(key_result.value);
edn_free(map);
return 0;
}
Output:
Parsed map with 3 entries
Name: Alice
Whitespace and Control Characters
EDN.C follows Clojure's exact behavior for whitespace and control character handling:
Whitespace Characters
The following characters act as whitespace delimiters (separate tokens):
| Character | Hex | Name | Common Use |
|-----------|------|----------------------|---------------------|
| | 0x20 | Space | Standard spacing |
| \t | 0x09 | Tab | Indentation |
| \n | 0x0A | Line Feed (LF) | Unix line ending |
| \r | 0x0D | Carriage Return (CR) | Windows line ending |
| \f | 0x0C | Form Feed | Page break |
| \v | 0x0B | Vertical Tab | Vertical spacing |
| , | 0x2C | Comma | Optional separator |
| FS | 0x1C | File Separator | Data separation |
| GS | 0x1D | Group Separator | Data separation |
| RS | 0x1E | Record Separator | Data separation |
| US | 0x1F | Unit Separator | Data separation |
Examples:
// All of these parse as vectors with 3 elements:
edn_read("[1 2 3]", 0); // spaces
edn_read("[1,2,3]", 0); // commas
edn_read("[1\t2\n3]", 0); // tabs and newlines
edn_read("[1\f2\x1C3]", 0); // formfeed and file separator
Control Characters in Identifiers
Control characters 0x00-0x1F (except whitespace delimiters) are valid in identifiers (symbols and keywords):
Valid identifier characters:
0x00-0x08: NUL, SOH, STX, ETX, EOT, ENQ, ACK, BEL, Backspace0x0E-0x1B: Shift Out through Escape
Examples:
// Backspace in symbol - valid!
edn_result_t r = edn_read("[\bfoo]", 0); // 1-element vector
edn_vector_count(r.value); // Returns 1
edn_free(r.value);
// Control characters in middle of identifier
const char input[] = {'[', 'f', 'o', 'o', 0x08, 'b', 'a', 'r', ']', 0};
r = edn_read(input, sizeof(input) - 1);
edn_vector_count(r.value); // Returns 1 (symbol: "foo\bbar")
edn_free(r.value);
// Versus whitespace - separates into 2 elements
edn_result_t r2 = edn_read("[foo\tbar]", 0); // Tab is whitespace
edn_vector_count(r2.value); // Returns 2 (symbols: "foo" and "bar")
edn_free(r2.value);
Note on null bytes (0x00): When using string literals with strlen(), null bytes will truncate the string. Always pass explicit length for data containing null bytes:
const char data[] = {'[', 'a', 0x00, 'b', ']', 0};
edn_result_t r = edn_read(data, 5); // Pass exact length: 5 bytes (excluding terminator)
API Reference
Core Functions
edn_read()
Read EDN from a UTF-8 string.
edn_result_t edn_read(const char *input, size_t length);
Parameters:
input: UTF-8 encoded string containing EDN data (must remain valid for zero-copy strings)length: Length of input in bytes, or0to usestrlen(input)
Returns: edn_result_t containing:
value: Parsed EDN value (NULL on error)error: Error code (EDN_OKon success)error_start: Start of error range (edn_error_position_twithoffset,line,column)error_end: End of error range (edn_error_position_twithoffset,line,column)error_message: Human-readable error description
Important: The returned value must be freed with edn_free().
edn_free()
Free an EDN value and all associated memory.
void edn_free(edn_value_t *value);
Parameters:
value: Value to free (may be NULL)
Note: This frees the entire value tree. Do not call free() on indi
