Autocache
🚀 Autocache - Intelligent Anthropic API Cache Proxy Automatically inject cache-control fields into Claude API requests to reduce costs by up to 90% and latency by up to 85%. Works as a transparent drop-in replacement for popular AI platforms like n8n, Flowise, Make.com, LangChain, and LlamaIndex—no code changes required
Install / Use
/learn @montevive/AutocacheQuality Score
Category
Development & EngineeringSupported Platforms
README
Autocache
Intelligent Anthropic API Cache Proxy with ROI Analytics
</div>Autocache is a smart proxy server that automatically injects cache-control fields into Anthropic Claude API requests, reducing costs by up to 90% and latency by up to 85% while providing detailed ROI analytics via response headers.
Motivation
Modern AI agent platforms like n8n, Flowise, Make.com, and even popular frameworks like LangChain and LlamaIndex don't support Anthropic's prompt caching—despite users building increasingly complex agents with:
- 📝 Large system prompts (1,000-5,000+ tokens)
- 🛠️ 10+ tool definitions (5,000-15,000+ tokens)
- 🔄 Repeated agent interactions (same context, different queries)
The Problem
When you build a complex agent in n8n with a detailed system prompt and multiple tools, every API call sends the full context again—costing 10x more than necessary. For example:
- Without caching: 15,000 token agent → $0.045 per request
- With caching: Same agent → $0.0045 per request after first call (90% savings)
Real User Pain Points
The AI community has been requesting this feature:
- 🔗 n8n GitHub Issue #13231 - "Anthropic model not caching system prompt"
- 🔗 Flowise Issue #4289 - "Support for Anthropic Prompt Caching"
- 🔗 n8n Community Request - Multiple requests for caching support
- 🔗 LangChain Issue #26701 - Implementation difficulties
The Solution
Autocache works as a transparent proxy that automatically analyzes your requests and injects cache-control headers at optimal breakpoints—no code changes required. Just point your existing n8n/Flowise/Make.com workflows to Autocache instead of directly to Anthropic's API.
Result: Same agents, 90% lower costs, 85% lower latency—automatically.
Alternatives & Comparison
Several tools offer prompt caching support, but Autocache is unique in combining zero-config transparent proxy with intelligent ROI analytics:
Existing Solutions
| Solution | Type | Auto-Injection | Intelligence | ROI Analytics | Drop-in for n8n/Flowise | | ------------------------------------------------------------------------------------ | ------- | ----------------------- | ------------------------------- | ------------------- | ----------------------- | | Autocache | Proxy | ✅ Fully automatic | ✅ Token analysis + ROI scoring | ✅ Response headers | ✅ Yes | | LiteLLM | Proxy | ⚠️ Requires config | ❌ Rule-based | ❌ No | ✅ Yes | | langchain-smart-cache | Library | ✅ Fully automatic | ✅ Priority-based | ✅ Statistics | ❌ LangChain only | | anthropic-cost-tracker | Library | ❓ Unclear | ❓ Unknown | ✅ Dashboard | ❌ Python only | | OpenRouter | Service | ⚠️ Provider-dependent | ❌ No | ❌ No | ✅ Yes | | AWS Bedrock | Cloud | ✅ ML-based | ✅ Yes | ✅ AWS only | ❌ AWS only |
What Makes Autocache Different
Autocache is the only solution that combines:
- 🔄 Transparent Proxy - Works with any tool (n8n, Flowise, Make.com) without code changes
- 🧠 Intelligent Analysis - Automatic token counting, ROI scoring, and optimal breakpoint placement
- 📊 Real-time ROI - Cost savings and break-even analysis in every response header
- 🏠 Self-Hosted - No external dependencies or cloud vendor lock-in
- ⚙️ Zero Config - Works out of the box with multiple strategies (conservative/moderate/aggressive)
Other solutions require configuration (LiteLLM), framework lock-in (langchain-smart-cache), or don't provide transparent proxy functionality for agent builders.
Features
✨ Drop-in Replacement: Simply change your API URL and get automatic caching 📊 ROI Analytics: Detailed cost savings and break-even analysis via headers 🎯 Smart Caching: Intelligent placement of cache breakpoints using multiple strategies ⚡ High Performance: Supports both streaming and non-streaming requests 🔧 Configurable: Multiple caching strategies and customizable thresholds 🐳 Docker Ready: Easy deployment with Docker and docker-compose 📋 Comprehensive Logging: Detailed request/response logging with structured output
Getting Started with Docker
The fastest way to start using Autocache is with the published Docker image from GitHub Container Registry:
Quick Start (30 seconds)
1. Run the container:
# Option A: With API key in environment variable
docker run -d -p 8080:8080 \
-e ANTHROPIC_API_KEY=sk-ant-... \
--name autocache \
ghcr.io/montevive/autocache:latest
# Option B: Without API key (pass it per-request in headers)
docker run -d -p 8080:8080 \
--name autocache \
ghcr.io/montevive/autocache:latest
2. Verify it's running:
curl http://localhost:8080/health
# {"status":"healthy","version":"1.0.1","strategy":"moderate"}
3. Test with a request:
# If using Option A (env var), no header needed:
curl http://localhost:8080/v1/messages \
-H "Content-Type: application/json" \
-H "anthropic-version: 2023-06-01" \
-d '{
"model": "claude-3-5-haiku-20241022",
"max_tokens": 50,
"messages": [{"role": "user", "content": "Hello!"}]
}'
# If using Option B (no env var), pass API key in header:
curl http://localhost:8080/v1/messages \
-H "Content-Type: application/json" \
-H "x-api-key: sk-ant-..." \
-H "anthropic-version: 2023-06-01" \
-d '{
"model": "claude-3-5-haiku-20241022",
"max_tokens": 50,
"messages": [{"role": "user", "content": "Hello!"}]
}'
4. Check the cache metadata in response headers:
X-Autocache-Injected: true
X-Autocache-Cache-Ratio: 0.750
X-Autocache-ROI-Percent: 85.2
X-Autocache-Savings-100req: $1.75
Available Docker Tags
latest- Latest stable release (recommended)v1.0.1- Specific version tag1.0.1,1.0,1- Semantic version aliases
Docker Image Details
- Registry:
ghcr.io/montevive/autocache - Architectures:
linux/amd64,linux/arm64 - Size: ~29 MB (optimized Alpine-based image)
- Source: https://github.com/montevive/autocache
Next Steps
- For production deployments with docker-compose, see the Quick Start section below
- For configuration options, see Configuration
- For n8n integration, see our n8n setup guide
Quick Start
Using Docker Compose (Recommended)
- Clone and configure:
git clone <repository-url>
cd autocache
cp .env.example .env
# Edit .env with your ANTHROPIC_API_KEY (optional - can pass in headers instead)
- Start the proxy:
docker-compose up -d
- Use in your application:
# Change your API base URL from:
# https://api.anthropic.com
# To:
# http://localhost:8080
Direct Usage
- Build and run:
go mod download
go build -o autocache ./cmd/autocache
# Option 1: Set API key via environment (optional)
ANTHROPIC_API_KEY=sk-ant-... ./autocache
# Option 2: Run without API key (pass it in request headers)
./autocache
- Configure your client:
# Python example - API key passed in headers
import anthropic
client = anthropic.Anthropic(
api_key="sk-ant-...", # This will be forwarded to Anthropic
base_url="http://localhost:8080" # Point to autocache
)
Configuration
Environment Variables
| Variable | Default | Description |
| ------------------------- | ------------ | -------------------------------------------------------------- |
| PORT | 8080 | Server port |
| ANTHROPIC_API_KEY | - | Your Anthropic API key (optional if passed in request headers) |
| CACHE_STRATEGY | moderate | Caching strategy:conservative/moderate/aggressive |
| LOG_LEVEL | info | Log level:debug/info/warn/error |
| MAX_CACHE_BREAKPOINTS | 4 | Maximum cache breakpoints (1-4) |
| TOKEN_MULTIPLIER | 1.0 | Token threshold multiplier |
API Key Configuration
The Anthropic API key can be provided in three ways (in order of precedence):
-
Request headers (recommended for multi-tenant scenarios):
Authorization: Bearer sk-ant-... # or x-api-key: sk-ant-... -
Environment variable:
ANTHROPIC_API_KEY=sk-ant-... ./autocache -
.envfile:ANT
