Distill

Reliable LLM outputs start with clean context. Deterministic deduplication, compression, and caching for RAG pipelines.

Generate Convert Improve

Install / Use

/learn @Siddhant-K-code/Distill

About this skill

Quality Score

0/100

README

Distill

Context intelligence layer for AI agents.

Deduplicates, compresses, and manages context across sessions - so your agents produce reliable, deterministic outputs. Includes a dedup pipeline with ~12ms overhead and persistent context memory with write-time dedup and hierarchical decay.

Less redundant data. Lower costs. Faster responses. Deterministic results.

Learn more →

📖 Distill implements the 4-layer context engineering stack (Cluster → Select → Rerank → Compress) described in The Agentic Engineering Guide — a free, open book on AI agent infrastructure.

Context sources → Distill → LLM
(RAG, tools, memory, docs)    (reliable outputs)

The Problem

LLM outputs are unreliable because context is polluted. "Garbage in, garbage out."

30-40% of context assembled from multiple sources is semantically redundant. Same information from docs, code, memory, and tools competing for attention. This leads to:

Non-deterministic outputs - Same workflow, different results
Confused reasoning - Signal diluted by repetition
Production failures - Works in demos, breaks at scale

You can't fix unreliable outputs with better prompts. You need to fix the context that goes in.

How It Works

Math, not magic. No LLM calls. Fully deterministic.

| Step | What it does | Benefit | |------|--------------|---------| | Deduplicate | Remove redundant information across sources | More reliable outputs | | Compress | Keep what matters, remove the noise | Lower token costs | | Summarize | Condense older context intelligently | Longer sessions | | Cache | Instant retrieval for repeated patterns | Faster responses |

Pipeline

Query → Over-fetch (50) → Cluster → Select → MMR Re-rank (8) → LLM

Over-fetch - Retrieve 3-5x more chunks than needed
Cluster - Group semantically similar chunks (agglomerative clustering)
Select - Pick best representative from each cluster
MMR Re-rank - Balance relevance and diversity

Result: Deterministic, diverse context in ~12ms. No LLM calls. Fully auditable.

Installation

Binary (Recommended)

Download from GitHub Releases:

# macOS (Apple Silicon)
curl -sL $(curl -s https://api.github.com/repos/Siddhant-K-code/distill/releases/latest | grep "browser_download_url.*darwin_arm64.tar.gz" | cut -d '"' -f 4) | tar xz

# macOS (Intel)
curl -sL $(curl -s https://api.github.com/repos/Siddhant-K-code/distill/releases/latest | grep "browser_download_url.*darwin_amd64.tar.gz" | cut -d '"' -f 4) | tar xz

# Linux (amd64)
curl -sL $(curl -s https://api.github.com/repos/Siddhant-K-code/distill/releases/latest | grep "browser_download_url.*linux_amd64.tar.gz" | cut -d '"' -f 4) | tar xz

# Linux (arm64)
curl -sL $(curl -s https://api.github.com/repos/Siddhant-K-code/distill/releases/latest | grep "browser_download_url.*linux_arm64.tar.gz" | cut -d '"' -f 4) | tar xz

# Move to PATH
sudo mv distill /usr/local/bin/

Or download directly from the releases page.

Go Install

go install github.com/Siddhant-K-code/distill@latest

Docker

docker pull ghcr.io/siddhant-k-code/distill:latest
docker run -p 8080:8080 -e OPENAI_API_KEY=your-key ghcr.io/siddhant-k-code/distill

Build from Source

git clone https://github.com/Siddhant-K-code/distill.git
cd distill
go build -o distill .

Quick Start

1. Standalone API (No Vector DB Required)

Start the API server and send chunks directly:

export OPENAI_API_KEY="your-key"  # For embeddings
distill api --port 8080

Deduplicate chunks:

curl -X POST http://localhost:8080/v1/dedupe \
  -H "Content-Type: application/json" \
  -d '{
    "chunks": [
      {"id": "1", "text": "React is a JavaScript library for building UIs."},
      {"id": "2", "text": "React.js is a JS library for building user interfaces."},
      {"id": "3", "text": "Vue is a progressive framework for building UIs."}
    ]
  }'

Response:

{
  "chunks": [
    {"id": "1", "text": "React is a JavaScript library for building UIs.", "cluster_id": 0},
    {"id": "3", "text": "Vue is a progressive framework for building UIs.", "cluster_id": 1}
  ],
  "stats": {
    "input_count": 3,
    "output_count": 2,
    "reduction_pct": 33,
    "latency_ms": 12
  }
}

With pre-computed embeddings (no OpenAI key needed):

curl -X POST http://localhost:8080/v1/dedupe \
  -H "Content-Type: application/json" \
  -d '{
    "chunks": [
      {"id": "1", "text": "React is...", "embedding": [0.1, 0.2, ...]},
      {"id": "2", "text": "React.js is...", "embedding": [0.11, 0.21, ...]},
      {"id": "3", "text": "Vue is...", "embedding": [0.9, 0.8, ...]}
    ]
  }'

2. With Vector Database

Connect to Pinecone or Qdrant for retrieval + deduplication:

export PINECONE_API_KEY="your-key"
export OPENAI_API_KEY="your-key"

distill serve --index my-index --port 8080

Query with automatic deduplication:

curl -X POST http://localhost:8080/v1/retrieve \
  -H "Content-Type: application/json" \
  -d '{"query": "how do I reset my password?"}'

3. MCP Integration (AI Assistants)

Works with Claude, Cursor, Amp, and other MCP-compatible assistants:

# Dedup only
distill mcp

# With memory and sessions
distill mcp --memory --session

Add to Claude Desktop (~/Library/Application Support/Claude/claude_desktop_config.json):

{
  "mcpServers": {
    "distill": {
      "command": "/path/to/distill",
      "args": ["mcp", "--memory", "--session"],
      "env": {
        "OPENAI_API_KEY": "your-key"
      }
    }
  }
}

See mcp/README.md for more configuration options.

Context Memory

Persistent memory that accumulates knowledge across agent sessions. Memories are deduplicated on write, ranked by relevance + recency on recall, and compressed over time through hierarchical decay.

Enable with the --memory flag on api or mcp commands.

CLI

# Store a memory
distill memory store --text "Auth uses JWT with RS256 signing" --tags auth --source docs

# Recall relevant memories
distill memory recall --query "How does authentication work?" --max-results 5

# Remove outdated memories
distill memory forget --tags deprecated

# View statistics
distill memory stats

API

# Start API with memory enabled
distill api --port 8080 --memory

# Store
curl -X POST http://localhost:8080/v1/memory/store \
  -H "Content-Type: application/json" \
  -d '{
    "session_id": "session-1",
    "entries": [{"text": "Auth uses JWT with RS256", "tags": ["auth"], "source": "docs"}]
  }'

# Recall
curl -X POST http://localhost:8080/v1/memory/recall \
  -H "Content-Type: application/json" \
  -d '{"query": "How does auth work?", "max_results": 5}'

MCP

Memory tools are available in Claude Desktop, Cursor, and other MCP clients when --memory is enabled:

distill mcp --memory

Tools exposed: store_memory, recall_memory, forget_memory, memory_stats.

How Decay Works

Memories compress over time based on access patterns:

Full text → Summary (~20%) → Keywords (~5%) → Evicted
  (24h)        (7 days)         (30 days)

Accessing a memory resets its decay clock. Configure ages via distill.yaml:

memory:
  db_path: distill-memory.db
  dedup_threshold: 0.15

Session Management

Token-budgeted context windows for long-running agent sessions. Push context incrementally - Distill deduplicates, compresses aging entries, and evicts when the budget is exceeded.

Enable with the --session flag on api or mcp commands.

CLI

# Create a session with 128K token budget
distill session create --session-id task-42 --max-tokens 128000

# Push context as the agent works
distill session push --session-id task-42 --role user --content "Fix the JWT validation bug"
distill session push --session-id task-42 --role tool --content "$(cat auth/jwt.go)" --source file_read --importance 0.8

# Read the current context window
distill session context --session-id task-42

# Clean up when done
distill session delete --session-id task-42

API

# Start API with sessions enabled
distill api --port 8080 --session

# Create session
curl -X POST http://localhost:8080/v1/session/create \
  -H "Content-Type: application/json" \
  -d '{"session_id": "task-42", "max_tokens": 128000}'

# Push entries
curl -X POST http://localhost:8080/v1/session/push \
  -H "Content-Type: application/json" \
  -d '{
    "session_id": "task-42",
    "entries": [
      {"role": "tool", "content": "file contents...", "source": "file_read", "importance": 0.8}
    ]
  }'

# Read context window
curl -X POST http://localhost:8080/v1/session/context \
  -H "Content-Type: application/json" \
  -d '{"session_id": "task-42"}'

MCP

Session tools are available when --session is enabled:

distill mcp --session

Tools exposed: create_session, push_session, session_context, delete_session.

How Budget Enforcement Works

Related Skills

xurl

349.7k

A CLI tool for making authenticated requests to the X (Twitter) API. Use this skill when you need to post tweets, reply, quote, search, read posts, manage followers, send DMs, upload media, or interact with any X API v2 endpoint.

feishu-drive

349.7k

things-mac

349.7k

Manage Things 3 via the `things` CLI on macOS (add/update projects+todos via URL scheme; read/search/list from the local Things database)

clawhub

349.7k

Use the ClawHub CLI to search, install, update, and publish agent skills from clawhub.com