SkillAgentSearch skills...

Isartor

Open-source Prompt Firewall — deflect up to 95% of redundant LLM traffic before it leaves your infrastructure. Documents: https://isartor-ai.github.io/Isartor/index.html

Install / Use

/learn @isartor-ai/Isartor

README

<p align="center"> <img src="docs/logo.png" alt="Isartor" width="400"> </p> <h1 align="center">Isartor</h1> <p align="center"> <strong>Open-source Prompt Firewall — deflect up to 95% of redundant LLM traffic before it leaves your infrastructure.</strong> </p> <p align="center"> Pure Rust · Single Binary · Zero Hidden Telemetry · Air-Gappable </p> <p align="center"> <a href="https://github.com/isartor-ai/Isartor/actions"><img src="https://github.com/isartor-ai/Isartor/actions/workflows/ci.yml/badge.svg" alt="CI" /></a> <a href="https://codecov.io/gh/isartor-ai/Isartor"><img src="https://codecov.io/gh/isartor-ai/Isartor/branch/main/graph/badge.svg" alt="codecov" /></a> <a href="LICENSE"><img src="https://img.shields.io/badge/License-Apache%202.0-blue.svg" alt="License" /></a> <a href="https://github.com/isartor-ai/Isartor/releases/latest"><img src="https://img.shields.io/github/v/release/isartor-ai/Isartor?display_name=tag&sort=semver" alt="Release" /></a> <a href="https://github.com/isartor-ai/Isartor/releases"><img src="https://img.shields.io/github/downloads/isartor-ai/Isartor/total?label=downloads&logo=github" alt="Downloads" /></a> <a href="https://discord.com/channels/1487002530113257492/1487002530700464142"><img src="https://img.shields.io/discord/1487002530113257492?label=discord&logo=discord" alt="Discord" /></a> <a href="https://github.com/orgs/isartor-ai/packages/container/package/isartor"><img src="https://img.shields.io/badge/container-ghcr.io%2Fisartor--ai%2Fisartor-2496ED?logo=docker&logoColor=white" alt="Container" /></a> <a href="https://isartor-ai.github.io/Isartor/"><img src="https://img.shields.io/badge/docs-isartor--ai.github.io-blue" alt="Docs" /></a> </p>

Quick Start

# Install (macOS / Linux)
curl -fsSL https://raw.githubusercontent.com/isartor-ai/Isartor/main/install.sh | sh

# Guided setup (provider, optional L2, connectors, verification)
isartor setup

# Or configure manually (example: Groq)
# isartor set-key -p groq
# isartor set-alias --alias fast --model gpt-4o-mini

# Verify the provider and run the post-install showcase
isartor check
isartor providers
isartor demo

# Connect your AI tool (pick one)
# (or start the gateway directly if you're ready)
isartor up
isartor connect copilot          # GitHub Copilot CLI
isartor connect claude           # Claude Code
isartor connect claude-desktop   # Claude Desktop
isartor connect cursor           # Cursor IDE
isartor connect openclaw         # OpenClaw
isartor connect codex            # OpenAI Codex CLI
isartor connect gemini           # Gemini CLI
isartor connect claude-copilot   # Claude Code + GitHub Copilot

The best first-run path is now: install → isartor setup → demo → connect tool. If you prefer the old explicit flow, set-key, check, and connect still work exactly as before. isartor demo still works without an API key, but with a configured provider it now also shows a live upstream round-trip before the cache replay.

You can also define request-time model aliases like fast, smart, or code that resolve to real provider model IDs before routing and cache-key generation.

For provider troubleshooting, Isartor also supports opt-in request/response debug logging. Set ISARTOR__ENABLE_REQUEST_LOGS=true, reproduce the issue, and inspect the separate JSONL stream with isartor logs --requests. Auth headers are redacted automatically, but prompt bodies may still contain sensitive data, so leave it off unless you need it.

For a fast Layer 3 status snapshot, run isartor providers or query the authenticated GET /debug/providers endpoint. It reports the active provider, configured model and endpoint, plus the last-known in-memory success/error state for upstream traffic since the current process started.

The provider registry also includes a broader set of OpenAI-compatible backends out of the box, including cerebras, nebius, siliconflow, fireworks, nvidia, and chutes, so isartor set-key -p <provider> and isartor setup can configure them without manual endpoint lookup.

See Isartor in the Terminal

<p align="center"> <img src="docs/readme-demo.gif" alt="Animated terminal walkthrough showing install, isartor up, and isartor demo" width="900"> </p> <p align="center"> <sub>Terminal walkthrough: install Isartor, start the gateway, then run the demo showcase.</sub> </p> <details> <summary><strong>More install options</strong> (Docker · Windows · Build from source)</summary>

Docker

docker run -p 8080:8080 \
  -e HF_HOME=/tmp/huggingface \
  -v isartor-hf:/tmp/huggingface \
  ghcr.io/isartor-ai/isartor:latest

~120 MB compressed. Includes the all-MiniLM-L6-v2 embedding model and a statically linked Rust binary.

Windows (PowerShell)

irm https://raw.githubusercontent.com/isartor-ai/Isartor/main/install.ps1 | iex

Build from source

git clone https://github.com/isartor-ai/Isartor.git
cd Isartor && cargo build --release
./target/release/isartor up
</details>

How It Works

If you already know your provider credentials, the day-one path is:

curl -fsSL https://raw.githubusercontent.com/isartor-ai/Isartor/main/install.sh | sh
isartor setup
isartor demo
isartor up --detach
isartor connect copilot

Why Isartor?

AI coding agents and personal assistants repeat themselves — a lot. Copilot, Claude Code, Cursor, and OpenClaw send the same system instructions, the same context preambles, and often the same user prompts across every turn of a conversation. Standard API gateways forward all of it to cloud LLMs regardless.

Isartor sits between your tools and the cloud. It intercepts every prompt and runs a cascade of local algorithms — from sub-millisecond hashing to in-process neural inference — to resolve requests before they reach the network. Only the genuinely hard prompts make it through.

The result: lower costs, lower latency, and less data leaving your perimeter.

| | Without Isartor | With Isartor | |:--|:----------------|:-------------| | Repeated prompts | Full cloud round-trip every time | Answered locally in < 1 ms | | Similar prompts ("Price?" / "Cost?") | Full cloud round-trip every time | Matched semantically, answered locally in 1–5 ms | | System instructions (CLAUDE.md, copilot-instructions) | Sent in full on every request | Deduplicated and compressed per session | | Simple FAQ / data extraction | Routed to GPT-4 / Claude | Resolved by embedded SLM in 50–200 ms | | Complex reasoning | Routed to cloud | Routed to cloud ✓ |


The Deflection Stack

Every request passes through five layers. Only prompts that survive the full stack reach the cloud.

Request ──► L1a Exact Cache ──► L1b Semantic Cache ──► L2 SLM Router ──► L2.5 Context Optimiser ──► L3 Cloud
                 │ hit                │ hit                 │ simple             │ compressed               │
                 ▼                    ▼                     ▼                    ▼                          ▼
              Instant             Instant             Local Answer      Smaller Prompt             Cloud Answer

| Layer | What It Does | How | Latency | |:------|:-------------|:----|:--------| | L1a Exact Cache | Traps duplicate prompts and agent loops | ahash deterministic hashing | < 1 ms | | L1b Semantic Cache | Catches paraphrases ("Price?" ≈ "Cost?") | Cosine similarity via pure-Rust candle embeddings | 1–5 ms | | L2 SLM Router | Resolves simple queries locally | Embedded Small Language Model (Qwen-1.5B via candle GGUF) | 50–200 ms | | L2.5 Context Optimiser | Compresses repeated instructions per session | Dedup + minify (CLAUDE.md, copilot-instructions) | < 1 ms | | L3 Cloud Logic | Routes complex prompts to OpenAI / Anthropic / Azure | Load balancing with retry and fallback | Network-bound |

Benchmark results

| Workload | Deflection Rate | Detail | |:---------|:---------------:|:-------| | Warm agent session (Claude Code, 20 prompts) | 95% | L1a 80% · L1b 10% · L2 5% · L3 5% | | Repetitive FAQ loop (1,000 prompts) | 60% | L1a 41% · L1b 19% · L3 40% | | Diverse code-generation tasks (78 prompts) | 38% | Exact-match duplicates only; all unique tasks route to L3 |

P50 latency for a cache hit: 0.3 ms. Full benchmark methodology →


AI Tool Integrations

One command connects your favourite tool. No proxy, no MITM, no CA certificates.

| Tool | Command | Mechanism | |:-----|:--------|:----------| | GitHub Copilot CLI | isartor connect copilot | MCP server (stdio or HTTP/SSE at /mcp/) | | GitHub Copilot in VS Code | isartor connect copilot-vscode | Managed settings.json debug overrides | | OpenClaw | isartor connect openclaw | Managed OpenClaw provider config (openclaw.json) | | Claude Code | isartor connect claude | ANTHROPIC_BASE_URL override | | Claude Desktop | isartor connect claude-desktop | Managed local MCP registration (isartor mcp) | | Claude Code + Copilot | isartor connect claude-copilot | Claude base URL + Copilot-backed L3 | | Cursor IDE | isartor connect cursor | Base URL + MCP registration at /mcp/ | | OpenAI Codex CLI | isartor connect codex | OPENAI_BASE_URL override | | Gemini CLI | isartor connect gemini | GEMINI_API_BASE_URL override | | OpenCode | isartor connect opencode | Global provider + auth config | | Any OpenAI-compatible tool | isartor connect generic | Configurable env var override |

Full integration guides →

OpenClaw note: use Isartor's OpenAI-compatible /v1 base path, not the root :8080 URL. If you change Isartor's gateway API key later, rerun isartor connect openclaw so OpenClaw's per-agent model registry refreshes too.


How Isartor Compares

This is the honest version: Isartor is not tryin

View on GitHub
GitHub Stars16
CategoryOperations
Updated3h ago
Forks4

Languages

Rust

Security Score

95/100

Audited on Apr 6, 2026

No findings