Headroom
The Context Optimization Layer for LLM Applications
Install / Use
/learn @chopratejas/HeadroomQuality Score
Category
Development & EngineeringSupported Platforms
README
Where Headroom Fits
Your Agent / App
(coding agents, customer support bots, RAG pipelines,
data analysis agents, research agents, any LLM app)
│
│ tool calls, logs, DB reads, RAG results, file reads, API responses
▼
Headroom ← proxy, Python/TypeScript SDK, or framework integration
│
▼
LLM Provider (OpenAI, Anthropic, Google, Bedrock, 100+ via LiteLLM)
Headroom sits between your application and the LLM provider. It intercepts requests, compresses the context, and forwards an optimized prompt. Use it as a transparent proxy (zero code changes), a Python function (compress()), or a framework integration (LangChain, LiteLLM, Agno).
What gets compressed
Headroom optimizes any data your agent injects into a prompt:
- Tool outputs — shell commands, API calls, search results
- Database queries — SQL results, key-value lookups
- RAG retrievals — document chunks, embeddings results
- File reads — code, logs, configs, CSVs
- API responses — JSON, XML, HTML
- Conversation history — long agent sessions with repetitive context
Quick Start
Python:
pip install "headroom-ai[all]"
TypeScript / Node.js:
npm install headroom-ai
Any agent — one function
Python:
from headroom import compress
result = compress(messages, model="claude-sonnet-4-5-20250929")
response = client.messages.create(model="claude-sonnet-4-5-20250929", messages=result.messages)
print(f"Saved {result.tokens_saved} tokens ({result.compression_ratio:.0%})")
TypeScript:
import { compress } from 'headroom-ai';
const result = await compress(messages, { model: 'gpt-4o' });
const response = await openai.chat.completions.create({ model: 'gpt-4o', messages: result.messages });
console.log(`Saved ${result.tokensSaved} tokens`);
Works with any LLM client — Anthropic, OpenAI, LiteLLM, Bedrock, Vercel AI SDK, or your own code.
Any agent — proxy (zero code changes)
headroom proxy --port 8787
# Point any LLM client at the proxy
ANTHROPIC_BASE_URL=http://localhost:8787 your-app
OPENAI_BASE_URL=http://localhost:8787/v1 your-app
Works with any language, any tool, any framework. Proxy docs
Coding agents — one command
headroom wrap claude # Starts proxy + launches Claude Code
headroom wrap codex # Starts proxy + launches OpenAI Codex CLI
headroom wrap aider # Starts proxy + launches Aider
headroom wrap cursor # Starts proxy + prints Cursor config
Headroom starts a proxy, points your tool at it, and compresses everything automatically.
Multi-agent — SharedContext
from headroom import SharedContext
ctx = SharedContext()
ctx.put("research", big_agent_output) # Agent A stores (compressed)
summary = ctx.get("research") # Agent B reads (~80% smaller)
full = ctx.get("research", full=True) # Agent B gets original if needed
Compress what moves between agents — any framework. SharedContext Guide
MCP Tools (Claude Code, Cursor)
headroom mcp install && claude
Gives your AI tool three MCP tools: headroom_compress, headroom_retrieve, headroom_stats. MCP Guide
Drop into your existing stack
| Your setup | Add Headroom | One-liner |
|------------|-------------|-----------|
| Any Python app | compress() | result = compress(messages, model="gpt-4o") |
| Any TypeScript app | compress() | const result = await compress(messages, { model: 'gpt-4o' }) |
| Vercel AI SDK | Middleware | wrapLanguageModel({ model, middleware: headroomMiddleware() }) |
| OpenAI Node SDK | Wrap client | const client = withHeadroom(new OpenAI()) |
| Anthropic TS SDK | Wrap client | const client = withHeadroom(new Anthropic()) |
| Multi-agent | SharedContext | ctx = SharedContext(); ctx.put("key", data) |
| LiteLLM | Callback | litellm.callbacks = [HeadroomCallback()] |
| Any Python proxy | ASGI Middleware | app.add_middleware(CompressionMiddleware) |
| Agno agents | Wrap model | HeadroomAgnoModel(your_model) |
| LangChain | Wrap model | HeadroomChatModel(your_llm) |
| OpenClaw | ContextEngine plugin | openclaw plugins install headroom-openclaw |
| Claude Code | Wrap | headroom wrap claude |
| Codex / Aider | Wrap | headroom wrap codex or headroom wrap aider |
Full Integration Guide | TypeScript SDK
Demo
<p align="center"> <img src="Headroom-2.gif" alt="Headroom Demo" width="800"> </p>Does It Actually Work?
100 production log entries. One critical error buried at position 67.
| | Baseline | Headroom | |--|----------|----------| | Input tokens | 10,144 | 1,260 | | Correct answers | 4/4 | 4/4 |
Both responses: "payment-gateway, error PG-5523, fix: Increase max_connections to 500, 1,847 transactions affected."
87.6% fewer tokens. Same answer. Run it: python examples/needle_in_haystack_test.py
From 100 log entries, SmartCrusher kept 6: first 3 (boundary), the FATAL error at position 67 (anomaly detection), and last 2 (recency). The error was automatically preserved — not by keyword matching, but by statistical analysis of field variance.
</details>Real Workloads
| Scenario | Before | After | Savings | |----------|--------|-------|---------| | Code search (100 results) | 17,765 | 1,408 | 92% | | SRE incident debugging | 65,694 | 5,118 | 92% | | Codebase exploration | 78,502 | 41,254 | 47% | | GitHub issue triage | 54,174 | 14,761 | 73% |
Accuracy Benchmarks
Compression preserves accuracy — tested on real OSS benchmarks.
Standard Benchmarks — Baseline (direct to API) vs Headroom (through proxy):
| Benchmark | Category | N | Baseline | Headroom | Delta | |-----------|----------|---|----------|----------|-------| | GSM8K | Math | 100 | 0.870 | 0.870 | 0.000 | | TruthfulQA | Factual | 100 | 0.530 | 0.560 | +0.030 |
Compression Benchmarks — Accuracy after full compression stack:
| Benchmark | Category | N | Accuracy | Compression | Method | |-----------|----------|---|----------|-------------|--------| | SQuAD v2 | QA | 100 | 97% | 19% | Before/After | | BFCL | Tool/Function | 100 | 97% | 32% | LLM-as-Judge | | Tool Outputs (built-in) | Agent | 8 | 100% | 20% | Before/After | | CCR Needle Retention | Lossless | 50 | 100% | 77% | Exact Match |
Run it yourself:
# Quick smoke test (8 cases, ~10s)
python -m headroom.evals quick -n 8 --provider openai --model gpt-4o-mini
# Full Tier 1 suite (~$3, ~15 min)
python -m headroom.evals suite --tier 1 -o eval_results/
# CI mode (exit 1 on regression)
python -m headroom.evals suite --tier 1 --ci
Full methodology: Benchmarks | Evals Framework
Key Capabilities
Lossless Compression
Headroom never throws data away. It compresses aggressively, stores the originals, and gives the LLM a tool to retrieve full details when needed. When it compresses 500 items to 20, it tells the model what was omitted ("87 passed, 2 failed, 1 error") so the model knows when to ask for more.
Smart Content Detection
Auto-detects what's in your context — JSON arrays, code, logs, plain text — and routes each to the best compressor. JSON goes to SmartCrusher, code goes through AST-aware compression (Python, JS, Go, Rust, Java, C++), text goes to Kompress (ModernBERT-based, with [ml] extra).
Cache Optimization
Stabilizes message prefixes so your provider's KV cache actually works. Claude offers a 90% read discount on cached prefixes — but almost no framework takes advantage of it. Headroom does.
Related Skills
node-connect
344.1kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
claude-opus-4-5-migration
96.8kMigrate prompts and code from Claude Sonnet 4.0, Sonnet 4.5, or Opus 4.1 to Opus 4.5
frontend-design
96.8kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
Hook Development
96.8kThis skill should be used when the user asks to "create a hook", "add a PreToolUse/PostToolUse/Stop hook", "validate tool use", "implement prompt-based hooks", "use ${CLAUDE_PLUGIN_ROOT}", "set up event-driven automation", "block dangerous commands", or mentions hook events (PreToolUse, PostToolUse, Stop, SubagentStop, SessionStart, SessionEnd, UserPromptSubmit, PreCompact, Notification). Provides comprehensive guidance for creating and implementing Claude Code plugin hooks with focus on advanced prompt-based hooks API.
