Isartor
Open-source Prompt Firewall — deflect up to 95% of redundant LLM traffic before it leaves your infrastructure. Documents: https://isartor-ai.github.io/Isartor/index.html
Install / Use
/learn @isartor-ai/IsartorREADME
Quick Start
# Install (macOS / Linux)
curl -fsSL https://raw.githubusercontent.com/isartor-ai/Isartor/main/install.sh | sh
# Guided setup (provider, optional L2, connectors, verification)
isartor setup
# Or configure manually (example: Groq)
# isartor set-key -p groq
# isartor set-alias --alias fast --model gpt-4o-mini
# Verify the provider and run the post-install showcase
isartor check
isartor providers
isartor demo
# Connect your AI tool (pick one)
# (or start the gateway directly if you're ready)
isartor up
isartor connect copilot # GitHub Copilot CLI
isartor connect claude # Claude Code
isartor connect claude-desktop # Claude Desktop
isartor connect cursor # Cursor IDE
isartor connect openclaw # OpenClaw
isartor connect codex # OpenAI Codex CLI
isartor connect gemini # Gemini CLI
isartor connect claude-copilot # Claude Code + GitHub Copilot
The best first-run path is now: install → isartor setup → demo → connect tool. If you prefer the old explicit flow, set-key, check, and connect still work exactly as before. isartor demo still works without an API key, but with a configured provider it now also shows a live upstream round-trip before the cache replay.
You can also define request-time model aliases like fast, smart, or code that resolve to real provider model IDs before routing and cache-key generation.
For provider troubleshooting, Isartor also supports opt-in request/response debug logging. Set ISARTOR__ENABLE_REQUEST_LOGS=true, reproduce the issue, and inspect the separate JSONL stream with isartor logs --requests. Auth headers are redacted automatically, but prompt bodies may still contain sensitive data, so leave it off unless you need it.
For a fast Layer 3 status snapshot, run isartor providers or query the authenticated GET /debug/providers endpoint. It reports the active provider, configured model and endpoint, plus the last-known in-memory success/error state for upstream traffic since the current process started.
The provider registry also includes a broader set of OpenAI-compatible backends out of the box, including cerebras, nebius, siliconflow, fireworks, nvidia, and chutes, so isartor set-key -p <provider> and isartor setup can configure them without manual endpoint lookup.
See Isartor in the Terminal
<p align="center"> <img src="docs/readme-demo.gif" alt="Animated terminal walkthrough showing install, isartor up, and isartor demo" width="900"> </p> <p align="center"> <sub>Terminal walkthrough: install Isartor, start the gateway, then run the demo showcase.</sub> </p> <details> <summary><strong>More install options</strong> (Docker · Windows · Build from source)</summary>Docker
docker run -p 8080:8080 \
-e HF_HOME=/tmp/huggingface \
-v isartor-hf:/tmp/huggingface \
ghcr.io/isartor-ai/isartor:latest
~120 MB compressed. Includes the
all-MiniLM-L6-v2embedding model and a statically linked Rust binary.
Windows (PowerShell)
irm https://raw.githubusercontent.com/isartor-ai/Isartor/main/install.ps1 | iex
Build from source
git clone https://github.com/isartor-ai/Isartor.git
cd Isartor && cargo build --release
./target/release/isartor up
</details>
How It Works
If you already know your provider credentials, the day-one path is:
curl -fsSL https://raw.githubusercontent.com/isartor-ai/Isartor/main/install.sh | sh
isartor setup
isartor demo
isartor up --detach
isartor connect copilot
Why Isartor?
AI coding agents and personal assistants repeat themselves — a lot. Copilot, Claude Code, Cursor, and OpenClaw send the same system instructions, the same context preambles, and often the same user prompts across every turn of a conversation. Standard API gateways forward all of it to cloud LLMs regardless.
Isartor sits between your tools and the cloud. It intercepts every prompt and runs a cascade of local algorithms — from sub-millisecond hashing to in-process neural inference — to resolve requests before they reach the network. Only the genuinely hard prompts make it through.
The result: lower costs, lower latency, and less data leaving your perimeter.
| | Without Isartor | With Isartor | |:--|:----------------|:-------------| | Repeated prompts | Full cloud round-trip every time | Answered locally in < 1 ms | | Similar prompts ("Price?" / "Cost?") | Full cloud round-trip every time | Matched semantically, answered locally in 1–5 ms | | System instructions (CLAUDE.md, copilot-instructions) | Sent in full on every request | Deduplicated and compressed per session | | Simple FAQ / data extraction | Routed to GPT-4 / Claude | Resolved by embedded SLM in 50–200 ms | | Complex reasoning | Routed to cloud | Routed to cloud ✓ |
The Deflection Stack
Every request passes through five layers. Only prompts that survive the full stack reach the cloud.
Request ──► L1a Exact Cache ──► L1b Semantic Cache ──► L2 SLM Router ──► L2.5 Context Optimiser ──► L3 Cloud
│ hit │ hit │ simple │ compressed │
▼ ▼ ▼ ▼ ▼
Instant Instant Local Answer Smaller Prompt Cloud Answer
| Layer | What It Does | How | Latency |
|:------|:-------------|:----|:--------|
| L1a Exact Cache | Traps duplicate prompts and agent loops | ahash deterministic hashing | < 1 ms |
| L1b Semantic Cache | Catches paraphrases ("Price?" ≈ "Cost?") | Cosine similarity via pure-Rust candle embeddings | 1–5 ms |
| L2 SLM Router | Resolves simple queries locally | Embedded Small Language Model (Qwen-1.5B via candle GGUF) | 50–200 ms |
| L2.5 Context Optimiser | Compresses repeated instructions per session | Dedup + minify (CLAUDE.md, copilot-instructions) | < 1 ms |
| L3 Cloud Logic | Routes complex prompts to OpenAI / Anthropic / Azure | Load balancing with retry and fallback | Network-bound |
Benchmark results
| Workload | Deflection Rate | Detail | |:---------|:---------------:|:-------| | Warm agent session (Claude Code, 20 prompts) | 95% | L1a 80% · L1b 10% · L2 5% · L3 5% | | Repetitive FAQ loop (1,000 prompts) | 60% | L1a 41% · L1b 19% · L3 40% | | Diverse code-generation tasks (78 prompts) | 38% | Exact-match duplicates only; all unique tasks route to L3 |
P50 latency for a cache hit: 0.3 ms. Full benchmark methodology →
AI Tool Integrations
One command connects your favourite tool. No proxy, no MITM, no CA certificates.
| Tool | Command | Mechanism |
|:-----|:--------|:----------|
| GitHub Copilot CLI | isartor connect copilot | MCP server (stdio or HTTP/SSE at /mcp/) |
| GitHub Copilot in VS Code | isartor connect copilot-vscode | Managed settings.json debug overrides |
| OpenClaw | isartor connect openclaw | Managed OpenClaw provider config (openclaw.json) |
| Claude Code | isartor connect claude | ANTHROPIC_BASE_URL override |
| Claude Desktop | isartor connect claude-desktop | Managed local MCP registration (isartor mcp) |
| Claude Code + Copilot | isartor connect claude-copilot | Claude base URL + Copilot-backed L3 |
| Cursor IDE | isartor connect cursor | Base URL + MCP registration at /mcp/ |
| OpenAI Codex CLI | isartor connect codex | OPENAI_BASE_URL override |
| Gemini CLI | isartor connect gemini | GEMINI_API_BASE_URL override |
| OpenCode | isartor connect opencode | Global provider + auth config |
| Any OpenAI-compatible tool | isartor connect generic | Configurable env var override |
OpenClaw note: use Isartor's OpenAI-compatible /v1 base path, not the root :8080 URL. If you change Isartor's gateway API key later, rerun isartor connect openclaw so OpenClaw's per-agent model registry refreshes too.
How Isartor Compares
This is the honest version: Isartor is not tryin
