Proxy
Open source cost intelligence proxy for AI agents. Cut costs ~80% with smart model routing. Dashboard, policy engine, 11 providers. MIT licensed.
Install / Use
/learn @RelayPlane/ProxyQuality Score
Category
Development & EngineeringSupported Platforms
README
@relayplane/proxy
A Node.js npm LLM proxy that sits between your AI agents and providers. Drop-in replacement for OpenAI and Anthropic base URLs — no Docker, no Python, just npm install. Tracks every request, shows where the money goes, and offers configurable task-aware routing — all running locally, for free.
Live savings dashboard: relayplane.com/live — see real-time aggregate savings from developers worldwide.
The npm-native LLM proxy for Node.js developers. Works with Claude Code, Cursor, OpenClaw, and any tool that supports OPENAI_BASE_URL or ANTHROPIC_BASE_URL.
Free, open-source proxy features:
- 📊 Per-request cost tracking across 11 providers
- 💰 Cache-aware cost tracking - accurately tracks Anthropic prompt caching with cache read savings, creation costs, and true per-request costs
- 🔀 Configurable task-aware routing (complexity-based, cascade, model overrides)
- 🛡️ Circuit breaker - if the proxy fails, your agent doesn't notice
- 📈 Local dashboard at
localhost:4100- cost breakdown, savings analysis, provider health, agent breakdown - 💵 Budget enforcement - daily/hourly/per-request spend limits with block, warn, downgrade, or alert actions
- 🔍 Anomaly detection - catches runaway agent loops, cost spikes, and token explosions in real time
- 🔔 Cost alerts - threshold alerts at configurable percentages, webhook delivery, alert history
- ⬇️ Auto-downgrade - automatically switches to cheaper models when budget thresholds are hit
- 📦 Aggressive cache - exact-match response caching with gzipped disk persistence
- 🤖 Per-agent cost tracking - identifies agents by system prompt fingerprint and tracks cost per agent
- 📝 Content logging - dashboard shows system prompt preview, user message, and response preview per request
- 🔐 OAuth passthrough - correctly forwards
user-agentandx-appheaders for Claude Max subscription users (OpenClaw compatible) - 🧠 Osmosis mesh - collective learning layer that shares anonymized routing signals across users (on by default, opt-out:
relayplane mesh off) - 🔧 systemd/launchd service -
relayplane service installfor always-on operation with auto-restart - 🏥 Health watchdog -
/healthendpoint with uptime tracking and active probing - 🛡️ Config resilience - atomic writes, automatic backup/restore, credential separation
Cloud dashboard available separately - see Cloud Dashboard & Pro Features below. Your prompts always stay local.
Quick Start
npm install -g @relayplane/proxy
relayplane init
relayplane start
# Dashboard at http://localhost:4100
Works with any agent framework that talks to OpenAI or Anthropic APIs. Point your client at http://localhost:4100 (set ANTHROPIC_BASE_URL or OPENAI_BASE_URL) and the proxy handles the rest.
Auto-start with Claude Code
Add to ~/.claude/settings.json:
{
"hooks": {
"SessionStart": [
{
"hooks": [
{
"type": "command",
"command": "relayplane ensure-running"
}
]
}
]
}
}
RelayPlane will start automatically when Claude Code opens. If it's already running (multiple sessions), the hook exits immediately. No duplicate processes.
What's New in v1.8.35
Recent additions:
- relayplane.com/live — Atlas public proof-of-work dashboard showing real-time aggregate savings from developers worldwide
- Osmosis Phase 1 shipped — local telemetry capture tracks every routing decision; mesh is on by default
- Service installer hardened — detects invoking user, loads env files correctly on systemd installs
- Provider-aware auto-routing — Gemini, OpenRouter, xAI supported natively without extra config
- Agent-native signup flow —
relayplane loginhandles device OAuth inline
Note for upgraders from pre-v1.8.14: Telemetry and mesh are now ON by default. Disable both: relayplane telemetry off && relayplane mesh off
Supported Providers
Anthropic · OpenAI · Google Gemini · xAI/Grok · OpenRouter · DeepSeek · Groq · Mistral · Together · Fireworks · Perplexity
Configuration
RelayPlane reads configuration from ~/.relayplane/config.json. Override the path with the RELAYPLANE_CONFIG_PATH environment variable.
# Default location
~/.relayplane/config.json
# Override with env var
RELAYPLANE_CONFIG_PATH=/path/to/config.json relayplane start
A minimal config file:
{
"enabled": true,
"modelOverrides": {},
"routing": {
"mode": "cascade",
"cascade": { "enabled": true },
"complexity": { "enabled": true }
}
}
All configuration is optional - sensible defaults are applied for every field. The proxy merges your config with its defaults via deep merge, so you only need to specify what you want to change.
Architecture
Client (Claude Code / Aider / Cursor)
|
| OpenAI/Anthropic-compatible request
v
+-------------------------------------------------------+
| RelayPlane Proxy (local) |
|-------------------------------------------------------|
| 1) Parse request |
| 2) Cache check (exact or aggressive mode) |
| └─ HIT → return cached response (skip provider) |
| 3) Budget check (daily/hourly/per-request limits) |
| └─ BREACH → block / warn / downgrade / alert |
| 4) Anomaly detection (velocity, cost spike, loops) |
| └─ DETECTED → alert + optional block |
| 5) Auto-downgrade (if budget threshold exceeded) |
| └─ Rewrite model to cheaper alternative |
| 6) Infer task/complexity (pre-request) |
| 7) Select route/model |
| - explicit model / passthrough |
| - relayplane:auto/cost/fast/quality |
| - configured complexity/cascade rules |
| 8) Forward request to provider |
| 9) Return provider response + cache it |
| 10) Record telemetry + update budget tracking |
| 11) Mesh sync (push anonymized routing signals) |
+-------------------------------------------------------+
|
v
Provider APIs (Anthropic/OpenAI/Gemini/xAI/...)
How It Works
RelayPlane is a local HTTP proxy. You point your agent at localhost:4100 by setting ANTHROPIC_BASE_URL or OPENAI_BASE_URL. The proxy:
- Intercepts your LLM API requests
- Classifies the task using heuristics (token count, prompt patterns, keyword matching - no LLM calls)
- Routes to the configured model based on classification and your routing rules (or passes through to the original model by default)
- Forwards the request directly to the LLM provider (your prompts go straight to the provider, not through RelayPlane servers)
- Records token counts, latency, and cost locally for your dashboard
Default behavior is passthrough - requests go to whatever model your agent requested. Routing (cascade, complexity-based) is configurable and must be explicitly enabled.
Complexity-Based Routing
The proxy classifies incoming requests by complexity (simple, moderate, complex) based on prompt length, token patterns, and the presence of tools. Each tier maps to a different model.
{
"routing": {
"complexity": {
"enabled": true,
"simple": "claude-3-5-haiku-latest",
"moderate": "claude-sonnet-4-20250514",
"complex": "claude-opus-4-20250514"
}
}
}
How classification works:
- Simple - Short prompts, straightforward Q&A, basic code tasks
- Moderate - Multi-step reasoning, code review, analysis with context
- Complex - Architecture decisions, large codebases, tasks with many tools, long prompts with evaluation/comparison language
The classifier scores requests based on message count, total token length, tool usage, and content patterns (e.g., words like "analyze", "compare", "evaluate" increase the score). This happens locally - no prompt content is sent anywhere.
Model Overrides
Map any model name to a different one. Useful for silently redirecting expensive models to cheaper alternatives without changing your agent configuration:
{
"modelOverrides": {
"claude-opus-4-5": "claude-3-5-haiku",
"gpt-4o": "gpt-4o-mini"
}
}
Overrides are applied before any other routing logic. The original requested model is logged for tracking.
Cascade Mode
Start with the cheapest model and escalate only when the response shows uncertainty or refusal. This gives you the cost savings of a cheap model with a safety net.
{
"routing": {
"mode": "cascade",
"cascade": {
"enabled": true,
"models": [
"claude-3-5-haiku-latest",
"claude-sonnet-4-20250514",
"claude-opus-4-20250514"
],
"escalateOn": "uncertainty",
"maxEscalations": 2
}
}
}
escalateOn options:
| Value | Triggers escalation when... |
|-------|----------------------------|
| uncertainty | Response contains hedging language ("I'm not sure", "it's hard to say", "this is just a guess") |
| refusal | Model refuses to help ("I can't assist with that", "as an AI") |
| error | The request fails outright |
maxEscalations caps how many times the proxy will retry with a more expensive model. Default: 1.
The cascade walks through the models array in order, starting from the first. Each escalation moves to the next model in the list.
Smart Aliases
Us
