Distill
Reliable LLM outputs start with clean context. Deterministic deduplication, compression, and caching for RAG pipelines.
Install / Use
/learn @Siddhant-K-code/DistillREADME
Distill
Context intelligence layer for AI agents.
Deduplicates, compresses, and manages context across sessions - so your agents produce reliable, deterministic outputs. Includes a dedup pipeline with ~12ms overhead and persistent context memory with write-time dedup and hierarchical decay.
Less redundant data. Lower costs. Faster responses. Deterministic results.
📖 Distill implements the 4-layer context engineering stack (Cluster → Select → Rerank → Compress) described in The Agentic Engineering Guide — a free, open book on AI agent infrastructure.
Context sources → Distill → LLM
(RAG, tools, memory, docs) (reliable outputs)
The Problem
LLM outputs are unreliable because context is polluted. "Garbage in, garbage out."
30-40% of context assembled from multiple sources is semantically redundant. Same information from docs, code, memory, and tools competing for attention. This leads to:
- Non-deterministic outputs - Same workflow, different results
- Confused reasoning - Signal diluted by repetition
- Production failures - Works in demos, breaks at scale
You can't fix unreliable outputs with better prompts. You need to fix the context that goes in.
How It Works
Math, not magic. No LLM calls. Fully deterministic.
| Step | What it does | Benefit | |------|--------------|---------| | Deduplicate | Remove redundant information across sources | More reliable outputs | | Compress | Keep what matters, remove the noise | Lower token costs | | Summarize | Condense older context intelligently | Longer sessions | | Cache | Instant retrieval for repeated patterns | Faster responses |
Pipeline
Query → Over-fetch (50) → Cluster → Select → MMR Re-rank (8) → LLM
- Over-fetch - Retrieve 3-5x more chunks than needed
- Cluster - Group semantically similar chunks (agglomerative clustering)
- Select - Pick best representative from each cluster
- MMR Re-rank - Balance relevance and diversity
Result: Deterministic, diverse context in ~12ms. No LLM calls. Fully auditable.
Installation
Binary (Recommended)
Download from GitHub Releases:
# macOS (Apple Silicon)
curl -sL $(curl -s https://api.github.com/repos/Siddhant-K-code/distill/releases/latest | grep "browser_download_url.*darwin_arm64.tar.gz" | cut -d '"' -f 4) | tar xz
# macOS (Intel)
curl -sL $(curl -s https://api.github.com/repos/Siddhant-K-code/distill/releases/latest | grep "browser_download_url.*darwin_amd64.tar.gz" | cut -d '"' -f 4) | tar xz
# Linux (amd64)
curl -sL $(curl -s https://api.github.com/repos/Siddhant-K-code/distill/releases/latest | grep "browser_download_url.*linux_amd64.tar.gz" | cut -d '"' -f 4) | tar xz
# Linux (arm64)
curl -sL $(curl -s https://api.github.com/repos/Siddhant-K-code/distill/releases/latest | grep "browser_download_url.*linux_arm64.tar.gz" | cut -d '"' -f 4) | tar xz
# Move to PATH
sudo mv distill /usr/local/bin/
Or download directly from the releases page.
Go Install
go install github.com/Siddhant-K-code/distill@latest
Docker
docker pull ghcr.io/siddhant-k-code/distill:latest
docker run -p 8080:8080 -e OPENAI_API_KEY=your-key ghcr.io/siddhant-k-code/distill
Build from Source
git clone https://github.com/Siddhant-K-code/distill.git
cd distill
go build -o distill .
Quick Start
1. Standalone API (No Vector DB Required)
Start the API server and send chunks directly:
export OPENAI_API_KEY="your-key" # For embeddings
distill api --port 8080
Deduplicate chunks:
curl -X POST http://localhost:8080/v1/dedupe \
-H "Content-Type: application/json" \
-d '{
"chunks": [
{"id": "1", "text": "React is a JavaScript library for building UIs."},
{"id": "2", "text": "React.js is a JS library for building user interfaces."},
{"id": "3", "text": "Vue is a progressive framework for building UIs."}
]
}'
Response:
{
"chunks": [
{"id": "1", "text": "React is a JavaScript library for building UIs.", "cluster_id": 0},
{"id": "3", "text": "Vue is a progressive framework for building UIs.", "cluster_id": 1}
],
"stats": {
"input_count": 3,
"output_count": 2,
"reduction_pct": 33,
"latency_ms": 12
}
}
With pre-computed embeddings (no OpenAI key needed):
curl -X POST http://localhost:8080/v1/dedupe \
-H "Content-Type: application/json" \
-d '{
"chunks": [
{"id": "1", "text": "React is...", "embedding": [0.1, 0.2, ...]},
{"id": "2", "text": "React.js is...", "embedding": [0.11, 0.21, ...]},
{"id": "3", "text": "Vue is...", "embedding": [0.9, 0.8, ...]}
]
}'
2. With Vector Database
Connect to Pinecone or Qdrant for retrieval + deduplication:
export PINECONE_API_KEY="your-key"
export OPENAI_API_KEY="your-key"
distill serve --index my-index --port 8080
Query with automatic deduplication:
curl -X POST http://localhost:8080/v1/retrieve \
-H "Content-Type: application/json" \
-d '{"query": "how do I reset my password?"}'
3. MCP Integration (AI Assistants)
Works with Claude, Cursor, Amp, and other MCP-compatible assistants:
# Dedup only
distill mcp
# With memory and sessions
distill mcp --memory --session
Add to Claude Desktop (~/Library/Application Support/Claude/claude_desktop_config.json):
{
"mcpServers": {
"distill": {
"command": "/path/to/distill",
"args": ["mcp", "--memory", "--session"],
"env": {
"OPENAI_API_KEY": "your-key"
}
}
}
}
See mcp/README.md for more configuration options.
Context Memory
Persistent memory that accumulates knowledge across agent sessions. Memories are deduplicated on write, ranked by relevance + recency on recall, and compressed over time through hierarchical decay.
Enable with the --memory flag on api or mcp commands.
CLI
# Store a memory
distill memory store --text "Auth uses JWT with RS256 signing" --tags auth --source docs
# Recall relevant memories
distill memory recall --query "How does authentication work?" --max-results 5
# Remove outdated memories
distill memory forget --tags deprecated
# View statistics
distill memory stats
API
# Start API with memory enabled
distill api --port 8080 --memory
# Store
curl -X POST http://localhost:8080/v1/memory/store \
-H "Content-Type: application/json" \
-d '{
"session_id": "session-1",
"entries": [{"text": "Auth uses JWT with RS256", "tags": ["auth"], "source": "docs"}]
}'
# Recall
curl -X POST http://localhost:8080/v1/memory/recall \
-H "Content-Type: application/json" \
-d '{"query": "How does auth work?", "max_results": 5}'
MCP
Memory tools are available in Claude Desktop, Cursor, and other MCP clients when --memory is enabled:
distill mcp --memory
Tools exposed: store_memory, recall_memory, forget_memory, memory_stats.
How Decay Works
Memories compress over time based on access patterns:
Full text → Summary (~20%) → Keywords (~5%) → Evicted
(24h) (7 days) (30 days)
Accessing a memory resets its decay clock. Configure ages via distill.yaml:
memory:
db_path: distill-memory.db
dedup_threshold: 0.15
Session Management
Token-budgeted context windows for long-running agent sessions. Push context incrementally - Distill deduplicates, compresses aging entries, and evicts when the budget is exceeded.
Enable with the --session flag on api or mcp commands.
CLI
# Create a session with 128K token budget
distill session create --session-id task-42 --max-tokens 128000
# Push context as the agent works
distill session push --session-id task-42 --role user --content "Fix the JWT validation bug"
distill session push --session-id task-42 --role tool --content "$(cat auth/jwt.go)" --source file_read --importance 0.8
# Read the current context window
distill session context --session-id task-42
# Clean up when done
distill session delete --session-id task-42
API
# Start API with sessions enabled
distill api --port 8080 --session
# Create session
curl -X POST http://localhost:8080/v1/session/create \
-H "Content-Type: application/json" \
-d '{"session_id": "task-42", "max_tokens": 128000}'
# Push entries
curl -X POST http://localhost:8080/v1/session/push \
-H "Content-Type: application/json" \
-d '{
"session_id": "task-42",
"entries": [
{"role": "tool", "content": "file contents...", "source": "file_read", "importance": 0.8}
]
}'
# Read context window
curl -X POST http://localhost:8080/v1/session/context \
-H "Content-Type: application/json" \
-d '{"session_id": "task-42"}'
MCP
Session tools are available when --session is enabled:
distill mcp --session
Tools exposed: create_session, push_session, session_context, delete_session.
How Budget Enforcement Works
Related Skills
xurl
349.7kA CLI tool for making authenticated requests to the X (Twitter) API. Use this skill when you need to post tweets, reply, quote, search, read posts, manage followers, send DMs, upload media, or interact with any X API v2 endpoint.
feishu-drive
349.7k|
things-mac
349.7kManage Things 3 via the `things` CLI on macOS (add/update projects+todos via URL scheme; read/search/list from the local Things database)
clawhub
349.7kUse the ClawHub CLI to search, install, update, and publish agent skills from clawhub.com
