SkillAgentSearch skills...

Engram

Local-first context compression for AI coding tools. One binary saves 85-93% of redundant tokens across every LLM call.

Install / Use

/learn @pythondatascrape/Engram
About this skill

Quality Score

0/100

Supported Platforms

Claude Code
Claude Desktop
Cursor

README

Engram

Local-first context compression for AI coding tools. One binary saves 85-93% of redundant tokens across every LLM call.


What is Engram?

Every time an AI coding tool sends a request to an LLM, it re-sends the same context: who you are, what you're working on, your preferences, your project structure. This redundancy costs real money and eats into context windows.

Engram eliminates it. It runs locally as a lightweight daemon and compresses both your identity (CLAUDE.md, system prompts, project instructions) and your conversation context (message history, tool results, responses) across every LLM call. The result: dramatically smaller prompts, lower costs, and more room in the context window for what actually matters.

How It Works

Engram applies three compression stages:

  1. Identity compression — Verbose CLAUDE.md prose and project instructions are reduced to compact key=value codebook entries. Definitions are sent once on the first turn; subsequent turns reference keys only.
  2. Context compression — Conversation history is serialized using a learned codebook that strips JSON overhead from message objects (role=user content=... instead of full JSON).
  3. Response compression — LLM responses are compressed using provider-specific codebooks tuned to Anthropic and OpenAI output patterns.

Key Numbers

| Metric | Value | |--------|-------| | Identity compression | ~96-98% token reduction | | Context compression | ~40-60% token reduction | | Overall savings | 85-93% per session | | Startup overhead | <50ms | | Memory footprint | ~30MB resident |

Quick Start

# Install via Homebrew (macOS/Linux)
brew install pythondatascrape/tap/engram

# Or download a release binary
curl -fsSL https://github.com/pythondatascrape/engram/releases/latest/download/engram_$(uname -s | tr A-Z a-z)_$(uname -m | sed 's/x86_64/amd64/').tar.gz | tar xz
sudo mv engram /usr/local/bin/

# Or install from source
go install github.com/pythondatascrape/engram/cmd/engram@latest

# Install for Claude Code — one command, no manual steps
engram install --claude-code

# Restart Claude Code and start using Engram immediately

What engram install --claude-code does

Starting in v0.3.0, a single install command handles everything:

  1. Copies the Engram plugin into ~/.claude/plugins/
  2. Registers the statusline in ~/.claude/settings.json
  3. Creates ~/.engram/engram.yaml with working defaults (if not already present)
  4. Installs the daemon as a background service (launchd on macOS, systemd on Linux)
  5. Starts the service immediately
  6. Verifies the daemon socket and proxy are reachable
  7. Exits non-zero if any step fails — no silent partial installs

After install, restart Claude Code. No engram serve step required.

CLI Reference

| Command | Description | |---------|-------------| | engram install --claude-code | Zero-touch install for Claude Code — daemon, config, and plugin in one step | | engram install --openclaw | Install for OpenClaw | | engram install | Auto-detect and install for all supported clients | | engram analyze | Analyze your project and show compression opportunities | | engram advisor | Show optimization recommendations based on session data | | engram serve | Start the compression daemon manually (advanced) | | engram status | Show daemon status, active sessions, and savings |

Every command supports --help for detailed usage.

Common flags

| Flag | Default | Description | |------|---------|-------------| | --config | ~/.engram/engram.yaml | Path to configuration file | | --socket | ~/.engram/engram.sock | Unix socket path |

Integrations

Claude Code

engram install --claude-code
# Restart Claude Code — compression is active immediately

Engram registers as a Claude Code plugin and routes all LLM calls through the local proxy. Context is compressed transparently — no workflow changes needed.

Manual install (if you prefer explicit steps):

engram install --claude-code   # installs plugin + daemon
engram status                  # confirm daemon is running

OpenClaw

engram install --openclaw

SDKs

For custom integrations, Engram provides thin client SDKs in three languages. All connect to the local daemon over a Unix socket.

Python:

from engram import Engram

async with await Engram.connect() as client:
    result = await client.compress({"identity": "...", "history": [], "query": "..."})

Go:

client, _ := engram.Connect(ctx, "")
defer client.Close()
result, _ := client.Compress(ctx, map[string]any{...})

Node.js:

import { Engram } from "engram";

const client = await Engram.connect();
const result = await client.compress({identity: "...", history: [], query: "..."});

See the Integration Guide for details.

Configuration

Engram's config file lives at ~/.engram/engram.yaml and is created automatically during install. The most common settings to adjust:

# ~/.engram/engram.yaml
server:
  port: 4433          # daemon JSON-RPC port

proxy:
  port: 4242          # HTTP proxy port (Claude Code routes through this)
  window_size: 10     # context window compression depth

Run engram serve --help for the full list of options.

Demo

See the Travelbound demo project for a working example that shows Engram compressing a real project's context from ~4,200 tokens to ~380 tokens.

Documentation

Best Use

Engram's identity compression is only as good as the codebook it works from. A well-structured codebook can reduce identity context from thousands of tokens to a handful of key=value pairs. This section shows how to maximize that savings — and how to wire it into your LLM calls.

Why This Matters

Every LLM call re-sends your identity: who you are, what you prefer, how you work. Written as prose, that identity is verbose by design — natural language is redundant. Engram replaces prose with a derived codebook so the same information travels in a fraction of the tokens.

| Format | Example | ~Tokens | |--------|---------|---------| | Plain prose | "I prefer concise responses without summaries. I'm a senior Go engineer who has been writing Go for ten years. I'm working on a CLI tool targeting macOS and Linux." | ~45 | | Codebook (key=value) | role=engineer lang=go exp=10y platform=macos,linux style=concise no_summaries=true | ~14 | | Savings | | ~69% |

At scale — a full CLAUDE.md, system prompt, and project instructions — this compression reaches 96–98%. That headroom goes directly toward more productive context: longer conversation history, bigger diffs, more tool results.

How Codebooks Work

Engram's derive_codebook MCP tool scans your CLAUDE.md and project instructions, extracts structured dimensions, and produces a compact codebook. On the first turn, the codebook definition is sent once so the model understands the format. Every subsequent turn references keys only.

# First turn — definition sent once
codebook: role(enum:engineer,researcher) lang(text) style(enum:concise,verbose) no_summaries(bool)

# All subsequent turns — keys only
role=engineer lang=go style=concise no_summaries=true

You can also define codebooks manually in YAML for domain-specific applications:

# ~/.engram-codebook.yaml
name: my-project
version: 1
dimensions:
  - name: role
    type: enum
    values: [engineer, researcher, designer]
    required: true

  - name: lang
    type: text
    required: true

  - name: style
    type: enum
    values: [concise, verbose]
    required: false

  - name: no_summaries
    type: boolean
    required: false

Pass the override path to derive_codebook and it merges your explicit dimensions with anything auto-detected:

result = await client.compress({
    "identity": open("CLAUDE.md").read(),
    "yamlOverridePath": ".engram-codebook.yaml",
    ...
})

Maximizing Your Codebook

1. Make dimensions specific and bounded. Enum and boolean fields compress best — they map to a single token. Free-text fields still compress, but less aggressively.

# Good — bounded enum
- name: experience
  type: enum
  values: [junior, mid, senior, staff]

# Less efficient — open text
- name: experience
  type: text

2. Prefer required: true for your most-repeated attributes. Required dimensions are always included in the serialized output and always stripped from subsequent prose. Optional ones are only emitted when set.

3. Use check_redundancy before sending responses. The MCP tool flags prose that re-states what the codebook already encodes, so you can strip it before it enters context.

# In a PostToolUse hook or response filter
check = await mcp.check_redundancy(content=llm_response)
if check["redundant"]:
    llm_response = check["stripped"]  # codebook entries removed

4. Run engram analyze after your first session. It reads session data and suggests dimensions you haven't captured yet — common repeated phrases that are good codebook candidates.

engram analyze
# → Suggested dimensions: deadline_pressure (bool), test_style (enum: tdd,bdd,none)

Updating LLM Calls to Use Engram

For transparent proxy use (Claude Code, OpenClaw), no code changes are needed — all calls route through Engram automatically after engram install.

For SDK use, replace the direct Anthropic/OpenAI base URL with the local proxy:

Before:

import anthropic
client = anthropic.Ant

Related Skills

View on GitHub
GitHub Stars14
CategoryDevelopment
Updated1h ago
Forks0

Languages

JavaScript

Security Score

80/100

Audited on Apr 6, 2026

No findings