SkillAgentSearch skills...

Headroom

The Context Optimization Layer for LLM Applications

Install / Use

/learn @chopratejas/Headroom

README

<p align="center"> <h1 align="center">Headroom</h1> <p align="center"> <strong>Compress everything your AI agent reads. Same answers, fraction of the tokens.</strong> </p> <p align="center"> Every tool call, DB query, file read, and RAG retrieval your agent makes is 70-95% boilerplate.<br> Headroom compresses it away before it hits the model.<br><br> Works with <b>any agent</b> — coding agents (Claude Code, Codex, Cursor, Aider), custom agents<br> (LangChain, LangGraph, Agno, Strands, OpenClaw), or your own Python and TypeScript code. </p> </p> <p align="center"> <a href="https://github.com/chopratejas/headroom/actions/workflows/ci.yml"> <img src="https://github.com/chopratejas/headroom/actions/workflows/ci.yml/badge.svg" alt="CI"> </a> <a href="https://pypi.org/project/headroom-ai/"> <img src="https://img.shields.io/pypi/v/headroom-ai.svg" alt="PyPI"> </a> <a href="https://pypi.org/project/headroom-ai/"> <img src="https://img.shields.io/pypi/pyversions/headroom-ai.svg" alt="Python"> </a> <a href="https://pypistats.org/packages/headroom-ai"> <img src="https://img.shields.io/pypi/dm/headroom-ai.svg" alt="Downloads"> </a> <a href="https://www.npmjs.com/package/headroom-ai"> <img src="https://img.shields.io/npm/v/headroom-ai.svg" alt="npm"> </a> <a href="https://github.com/chopratejas/headroom/blob/main/LICENSE"> <img src="https://img.shields.io/badge/license-Apache%202.0-blue.svg" alt="License"> </a> <a href="https://chopratejas.github.io/headroom/"> <img src="https://img.shields.io/badge/docs-GitHub%20Pages-blue.svg" alt="Documentation"> </a> <a href="https://discord.gg/yRmaUNpsPJ"> <img src="https://img.shields.io/badge/Discord-Join%20us-5865F2?logo=discord&logoColor=white" alt="Discord"> </a> </p>

Where Headroom Fits

Your Agent / App
  (coding agents, customer support bots, RAG pipelines,
   data analysis agents, research agents, any LLM app)
      │
      │  tool calls, logs, DB reads, RAG results, file reads, API responses
      ▼
   Headroom  ← proxy, Python/TypeScript SDK, or framework integration
      │
      ▼
 LLM Provider  (OpenAI, Anthropic, Google, Bedrock, 100+ via LiteLLM)

Headroom sits between your application and the LLM provider. It intercepts requests, compresses the context, and forwards an optimized prompt. Use it as a transparent proxy (zero code changes), a Python function (compress()), or a framework integration (LangChain, LiteLLM, Agno).

What gets compressed

Headroom optimizes any data your agent injects into a prompt:

  • Tool outputs — shell commands, API calls, search results
  • Database queries — SQL results, key-value lookups
  • RAG retrievals — document chunks, embeddings results
  • File reads — code, logs, configs, CSVs
  • API responses — JSON, XML, HTML
  • Conversation history — long agent sessions with repetitive context

Quick Start

Python:

pip install "headroom-ai[all]"

TypeScript / Node.js:

npm install headroom-ai

Any agent — one function

Python:

from headroom import compress

result = compress(messages, model="claude-sonnet-4-5-20250929")
response = client.messages.create(model="claude-sonnet-4-5-20250929", messages=result.messages)
print(f"Saved {result.tokens_saved} tokens ({result.compression_ratio:.0%})")

TypeScript:

import { compress } from 'headroom-ai';

const result = await compress(messages, { model: 'gpt-4o' });
const response = await openai.chat.completions.create({ model: 'gpt-4o', messages: result.messages });
console.log(`Saved ${result.tokensSaved} tokens`);

Works with any LLM client — Anthropic, OpenAI, LiteLLM, Bedrock, Vercel AI SDK, or your own code.

Any agent — proxy (zero code changes)

headroom proxy --port 8787
# Point any LLM client at the proxy
ANTHROPIC_BASE_URL=http://localhost:8787 your-app
OPENAI_BASE_URL=http://localhost:8787/v1 your-app

Works with any language, any tool, any framework. Proxy docs

Coding agents — one command

headroom wrap claude       # Starts proxy + launches Claude Code
headroom wrap codex        # Starts proxy + launches OpenAI Codex CLI
headroom wrap aider        # Starts proxy + launches Aider
headroom wrap cursor       # Starts proxy + prints Cursor config

Headroom starts a proxy, points your tool at it, and compresses everything automatically.

Multi-agent — SharedContext

from headroom import SharedContext

ctx = SharedContext()
ctx.put("research", big_agent_output)      # Agent A stores (compressed)
summary = ctx.get("research")               # Agent B reads (~80% smaller)
full = ctx.get("research", full=True)       # Agent B gets original if needed

Compress what moves between agents — any framework. SharedContext Guide

MCP Tools (Claude Code, Cursor)

headroom mcp install && claude

Gives your AI tool three MCP tools: headroom_compress, headroom_retrieve, headroom_stats. MCP Guide

Drop into your existing stack

| Your setup | Add Headroom | One-liner | |------------|-------------|-----------| | Any Python app | compress() | result = compress(messages, model="gpt-4o") | | Any TypeScript app | compress() | const result = await compress(messages, { model: 'gpt-4o' }) | | Vercel AI SDK | Middleware | wrapLanguageModel({ model, middleware: headroomMiddleware() }) | | OpenAI Node SDK | Wrap client | const client = withHeadroom(new OpenAI()) | | Anthropic TS SDK | Wrap client | const client = withHeadroom(new Anthropic()) | | Multi-agent | SharedContext | ctx = SharedContext(); ctx.put("key", data) | | LiteLLM | Callback | litellm.callbacks = [HeadroomCallback()] | | Any Python proxy | ASGI Middleware | app.add_middleware(CompressionMiddleware) | | Agno agents | Wrap model | HeadroomAgnoModel(your_model) | | LangChain | Wrap model | HeadroomChatModel(your_llm) | | OpenClaw | ContextEngine plugin | openclaw plugins install headroom-openclaw | | Claude Code | Wrap | headroom wrap claude | | Codex / Aider | Wrap | headroom wrap codex or headroom wrap aider |

Full Integration Guide | TypeScript SDK


Demo

<p align="center"> <img src="Headroom-2.gif" alt="Headroom Demo" width="800"> </p>

Does It Actually Work?

100 production log entries. One critical error buried at position 67.

| | Baseline | Headroom | |--|----------|----------| | Input tokens | 10,144 | 1,260 | | Correct answers | 4/4 | 4/4 |

Both responses: "payment-gateway, error PG-5523, fix: Increase max_connections to 500, 1,847 transactions affected."

87.6% fewer tokens. Same answer. Run it: python examples/needle_in_haystack_test.py

<details> <summary><b>What Headroom kept</b></summary>

From 100 log entries, SmartCrusher kept 6: first 3 (boundary), the FATAL error at position 67 (anomaly detection), and last 2 (recency). The error was automatically preserved — not by keyword matching, but by statistical analysis of field variance.

</details>

Real Workloads

| Scenario | Before | After | Savings | |----------|--------|-------|---------| | Code search (100 results) | 17,765 | 1,408 | 92% | | SRE incident debugging | 65,694 | 5,118 | 92% | | Codebase exploration | 78,502 | 41,254 | 47% | | GitHub issue triage | 54,174 | 14,761 | 73% |

Accuracy Benchmarks

Compression preserves accuracy — tested on real OSS benchmarks.

Standard Benchmarks — Baseline (direct to API) vs Headroom (through proxy):

| Benchmark | Category | N | Baseline | Headroom | Delta | |-----------|----------|---|----------|----------|-------| | GSM8K | Math | 100 | 0.870 | 0.870 | 0.000 | | TruthfulQA | Factual | 100 | 0.530 | 0.560 | +0.030 |

Compression Benchmarks — Accuracy after full compression stack:

| Benchmark | Category | N | Accuracy | Compression | Method | |-----------|----------|---|----------|-------------|--------| | SQuAD v2 | QA | 100 | 97% | 19% | Before/After | | BFCL | Tool/Function | 100 | 97% | 32% | LLM-as-Judge | | Tool Outputs (built-in) | Agent | 8 | 100% | 20% | Before/After | | CCR Needle Retention | Lossless | 50 | 100% | 77% | Exact Match |

Run it yourself:

# Quick smoke test (8 cases, ~10s)
python -m headroom.evals quick -n 8 --provider openai --model gpt-4o-mini

# Full Tier 1 suite (~$3, ~15 min)
python -m headroom.evals suite --tier 1 -o eval_results/

# CI mode (exit 1 on regression)
python -m headroom.evals suite --tier 1 --ci

Full methodology: Benchmarks | Evals Framework


Key Capabilities

Lossless Compression

Headroom never throws data away. It compresses aggressively, stores the originals, and gives the LLM a tool to retrieve full details when needed. When it compresses 500 items to 20, it tells the model what was omitted ("87 passed, 2 failed, 1 error") so the model knows when to ask for more.

Smart Content Detection

Auto-detects what's in your context — JSON arrays, code, logs, plain text — and routes each to the best compressor. JSON goes to SmartCrusher, code goes through AST-aware compression (Python, JS, Go, Rust, Java, C++), text goes to Kompress (ModernBERT-based, with [ml] extra).

Cache Optimization

Stabilizes message prefixes so your provider's KV cache actually works. Claude offers a 90% read discount on cached prefixes — but almost no framework takes advantage of it. Headroom does.

Related Skills

View on GitHub
GitHub Stars1.1k
CategoryDevelopment
Updated1h ago
Forks92

Languages

Python

Security Score

100/100

Audited on Apr 1, 2026

No findings