<img src="docs/assets/hedl-logo.svg" alt="HEDL Logo" width="200"/> <h1 align="center">HEDL</h1> The Token-Efficient Data Format for LLM Applications Half the tokens. Same comprehension. Drop-in JSON replacement. <a href="https://crates.io/crates/hedl"><img src="https://img.shields.io/crates/v/hedl.svg" alt="Crates.io"></a> <a href="https://crates.io/crates/hedl"><img src="https://img.shields.io/crates/d/hedl.svg" alt="Downloads"></a> <a href="https://docs.rs/hedl"><img src="https://docs.rs/hedl/badge.svg" alt="Documentation"></a> <a href="https://github.com/dweve-ai/hedl/actions"><img src="https://github.com/dweve-ai/hedl/actions/workflows/ci.yml/badge.svg" alt="CI"></a> <a href="LICENSE"><img src="https://img.shields.io/badge/license-Apache%202.0-blue.svg" alt="License"></a> <a href="#quickstart">Quickstart</a> • <a href="#why-hedl">Why HEDL</a> • <a href="#benchmarks">Benchmarks</a> • <a href="#documentation">Docs</a> • <a href="#ecosystem">Ecosystem</a>

The Problem

You're building AI applications and sending structured data to LLMs. Like everyone else, you're probably using JSON.

But have you actually looked at what you're paying for?

{"id": "u1", "name": "Alice", "email": "alice@company.com", "role": "admin"}
{"id": "u2", "name": "Bob", "email": "bob@company.com", "role": "user"}
{"id": "u3", "name": "Carol", "email": "carol@company.com", "role": "user"}
{"id": "u4", "name": "Dave", "email": "dave@company.com", "role": "user"}
{"id": "u5", "name": "Eve", "email": "eve@company.com", "role": "user"}

See those "id":, "name":, "email":, "role": strings? They show up five times. That's not your data. That's overhead. Pure waste.

At Claude's pricing ($3/million tokens), a 10,000-user dataset costs $15 just in repeated key names. Every single API call. The more records you have, the more you pay to say the same words over and over.

The Solution

What if you could declare your structure once and then just send the data?

%V:2.0
%NULL:~
%QUOTE:"
%S:User:[id, name, email, role]
---
users: @User
 |u1,Alice,alice@company.com,admin
 |u2,Bob,bob@company.com,user
 |u3,Carol,carol@company.com,user
 |u4,Dave,dave@company.com,user
 |u5,Eve,eve@company.com,user

Same data. 56% fewer tokens.

The schema declaration (%S:) lets you define your structure once, then send only the values. No repeated keys, no brackets, no quotes around simple strings.

This is HEDL: Hierarchical Entity Data Language. A data format designed from the ground up for the economics of LLM applications.

flowchart LR
    subgraph Input["Your Data"]
        JSON["JSON"]
        XML["XML"]
        YAML["YAML"]
        CSV["CSV"]
        More["..."]
    end

    subgraph MCP["HEDL MCP Server"]
        Convert["Auto-Convert"]
    end

    subgraph LLM["LLM"]
        AI["Claude / GPT / etc."]
    end

    JSON --> Convert
    XML --> Convert
    YAML --> Convert
    CSV --> Convert
    More --> Convert
    Convert -->|"56% fewer tokens"| AI
    AI -->|"Response"| Convert
    Convert -->|"Back to original format"| JSON

    style Convert fill:#ff9,stroke:#333
    style AI fill:#9ff,stroke:#333

The MCP server handles everything automatically. Your AI agent sends JSON like it always did, the server converts to HEDL (saving you 56% on tokens), the LLM processes it, and responses come back in your original format. Zero code changes on your end.

Quickstart

Option 1: MCP Server (Recommended)

The fastest way to start saving tokens is the MCP server. Add HEDL to your AI agent with zero code changes.

{
  "mcpServers": {
    "hedl": {
      "command": "hedl-mcp",
      "args": ["--auto-convert"]
    }
  }
}

That's literally it. Your agent now uses 56% fewer tokens automatically.

<a href="https://dweve-ai.github.io/hedl-playground/">Try the Live Demo</a> - Convert JSON to HEDL in your browser

Option 2: CLI

If you want to experiment with HEDL from the command line, install the CLI:

cargo install hedl-cli

# Convert your existing JSON to HEDL
echo '{"users": [{"name": "Alice"}, {"name": "Bob"}]}' | hedl from-json

# Convert back to JSON when you need it
echo '%V:2.0
%NULL:~
%QUOTE:"
---
greeting: Hello, World!' | hedl to-json

Option 3: Rust Library

For full control, use the library directly:

cargo add hedl

use hedl::{parse, to_json, from_json};

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let doc = parse(r#"
%V:2.0
%NULL:~
%QUOTE:"
%S:User:[id,name,role]
---
users: @User
 |alice,Alice Smith,admin
 |bob,Bob Jones,user
"#)?;

    // Convert to JSON for APIs that need it
    let json = to_json(&doc)?;

    // Convert JSON to HEDL for your LLM prompts
    let hedl = from_json(&json_str)?;

    Ok(())
}

Why HEDL

"But will LLMs actually understand it?"

This was the first question we asked ourselves. We didn't assume the answer. We tested it.

We ran 571 structured data extraction questions across 7 real-world datasets, testing Mistral Large, DeepSeek Chat, and NVIDIA GLM-4.7. Real questions. Real data. Rigorous methodology.

| Format | Accuracy | Tokens/Question | Accuracy per 1K Tokens | |--------|:--------:|:---------------:|:----------------------:| | HEDL | 80.4% | 6,912 | 0.12 | | JSON | 70.1% | 15,697 | 0.05 | | YAML | 69.8% | 13,535 | 0.05 | | TOON | 68.2% | 7,320 | 0.09 | | XML | 68.6% | 18,164 | 0.04 | | CSV | 67.3% | 8,049 | 0.08 |

HEDL delivers 2.4x more correct answers per token than JSON.

HEDL wins on both accuracy (+10.3 percentage points over JSON) and efficiency (56% fewer tokens). At scale, this compounds dramatically: for the same budget, HEDL lets you send 2x the context while getting more correct answers.

CSV is efficient but falls apart on complex queries. YAML is nearly as verbose as JSON. XML is worst on both metrics. TOON is another token-efficient format, but HEDL beats it by +12.2 accuracy points with similar token usage.

HEDL is the only format that's both compact AND comprehensible to LLMs.

The Token Economics

Here's what real benchmarks look like. Real data. Real savings.

| Dataset Type | JSON Tokens | HEDL Tokens | Savings | |--------------|:-----------:|:-----------:|:-------:| | Flat user records | 15,697 | 6,912 | 56.0% | | Product catalog | 15,623 | 6,842 | 56.2% | | Nested blog posts | 15,771 | 6,981 | 55.7% | | Order history | 15,698 | 6,912 | 56.0% | | Config files | 476 | 210 | 55.9% |

Average savings: 56%

At scale, this adds up fast. A service processing 1 billion tokens monthly saves $1,680/month by switching from JSON to HEDL. Same data. Same comprehension. Half the cost.

Beyond Token Savings

HEDL isn't just about compression. It's about building better AI applications.

Schema Validation catches malformed data before it hits your LLM:

%S:Product:[sku, name, price]
---
products: @Product
 |ABC-123,Widget,29.99
 |DEF-456,Gadget,not_a_price   # Error caught at parse time

Type-Safe References let you link entities without duplicating data:

users: @User
 |alice,Alice Smith,alice@company.com

orders: @Order
 |ord-001,@User:alice,2024-01-15,299.99
  #          ^^^^^^^^^^^^ validated at parse time

List Literals use (...) syntax for ordered sequences:

%S:Article:[id,title,tags,score]
---
articles: @Article
 |art-1,Introduction to HEDL,(tutorial,beginner,data),4.5
 |art-2,Advanced Patterns,(advanced,optimization),4.8
 |art-3,No Tags,(),3.2

Lists use (...) for any scalar values (strings, references, etc.), distinct from tensors [...] which are for numeric data only.

LSP Integration gives you real-time validation and autocomplete in your editor: syntax highlighting, auto-completion (@Us → @User:alice), hover documentation, go-to-definition, and error squiggles before you even save the file.

Headers and Metadata

Every HEDL document starts with headers:

%V:2.0         # Version
%NULL:~        # Null character
%QUOTE:"       # Quote character

Count Metadata helps LLMs understand your data without scanning all rows:

%V:2.0
%NULL:~
%QUOTE:"
%S:Order:[id,customer,status,total]
%C:Order.total=1247
%C:Order.status:delivered=892,shipped=234,pending=121
---
orders: @Order
 |o1,cust-001,delivered,99.99
  # ... 1246 more orders

1-Space Indentation keeps things clean:

%V:2.0
%NULL:~
%QUOTE:"
---
root:
 child:         # Exactly 1 space
  grandchild:   # Exactly 1 space per level

Benchmarks

Performance (2026-02-02, release build)

| Operation | Latency (p50) | Size | |-----------|:-------------:|:----:| | Parsing | 37.1 µs | Tiny | | Parsing | 396 µs | Small | | Parsing | 12.1 ms | Medium | | JSON Conversion | 10.0 µs | Tiny | | JSON Conversion | 115 µs | Small | | JSON Conversion | 1.10 ms | Medium | | Validation | 23.7 µs | Small | | Canonicalization | 83.5 µs | Tiny | | Full Pipeline | 1.04 ms | Small |

Scaling Characteristics

HEDL scales linearly: O(n) parsing, O(depth) for nesting. Median latencies stay under 15ms for all document sizes, and tails are predictable (p99 latencies available in benchmark baselines). For really large files, hedl-stream provides streaming support with bounded memory usage.

Test Coverage

We take testing seriously: 10,000+ tests across 19 crates. Unit tests, integration tests, property-based testing with proptest, and fuzz testing. Zero unsafe code in the core parser.

Ecosystem

HEDL plays well with others. Use it alongside your existing tools.

Format Conversion

| Format | Import | Export | Streaming | Use Case | |--------|:------:|:------:|:---------:|----------| | *

Hedl

Install / Use

README