Autocache

Intelligent Anthropic API Cache Proxy with ROI Analytics

</div>

Autocache is a smart proxy server that automatically injects cache-control fields into Anthropic Claude API requests, reducing costs by up to 90% and latency by up to 85% while providing detailed ROI analytics via response headers.

Motivation

Modern AI agent platforms like n8n, Flowise, Make.com, and even popular frameworks like LangChain and LlamaIndex don't support Anthropic's prompt caching—despite users building increasingly complex agents with:

📝 Large system prompts (1,000-5,000+ tokens)
🛠️ 10+ tool definitions (5,000-15,000+ tokens)
🔄 Repeated agent interactions (same context, different queries)

The Problem

When you build a complex agent in n8n with a detailed system prompt and multiple tools, every API call sends the full context again—costing 10x more than necessary. For example:

Without caching: 15,000 token agent → $0.045 per request
With caching: Same agent → $0.0045 per request after first call (90% savings)

Real User Pain Points

The AI community has been requesting this feature:

🔗 n8n GitHub Issue #13231 - "Anthropic model not caching system prompt"
🔗 Flowise Issue #4289 - "Support for Anthropic Prompt Caching"
🔗 n8n Community Request - Multiple requests for caching support
🔗 LangChain Issue #26701 - Implementation difficulties

The Solution

Autocache works as a transparent proxy that automatically analyzes your requests and injects cache-control headers at optimal breakpoints—no code changes required. Just point your existing n8n/Flowise/Make.com workflows to Autocache instead of directly to Anthropic's API.

Result: Same agents, 90% lower costs, 85% lower latency—automatically.

Alternatives & Comparison

Several tools offer prompt caching support, but Autocache is unique in combining zero-config transparent proxy with intelligent ROI analytics:

Existing Solutions

| Solution | Type | Auto-Injection | Intelligence | ROI Analytics | Drop-in for n8n/Flowise | | ------------------------------------------------------------------------------------ | ------- | ----------------------- | ------------------------------- | ------------------- | ----------------------- | | Autocache | Proxy | ✅ Fully automatic | ✅ Token analysis + ROI scoring | ✅ Response headers | ✅ Yes | | LiteLLM | Proxy | ⚠️ Requires config | ❌ Rule-based | ❌ No | ✅ Yes | | langchain-smart-cache | Library | ✅ Fully automatic | ✅ Priority-based | ✅ Statistics | ❌ LangChain only | | anthropic-cost-tracker | Library | ❓ Unclear | ❓ Unknown | ✅ Dashboard | ❌ Python only | | OpenRouter | Service | ⚠️ Provider-dependent | ❌ No | ❌ No | ✅ Yes | | AWS Bedrock | Cloud | ✅ ML-based | ✅ Yes | ✅ AWS only | ❌ AWS only |

What Makes Autocache Different

Autocache is the only solution that combines:

🔄 Transparent Proxy - Works with any tool (n8n, Flowise, Make.com) without code changes
🧠 Intelligent Analysis - Automatic token counting, ROI scoring, and optimal breakpoint placement
📊 Real-time ROI - Cost savings and break-even analysis in every response header
🏠 Self-Hosted - No external dependencies or cloud vendor lock-in
⚙️ Zero Config - Works out of the box with multiple strategies (conservative/moderate/aggressive)

Other solutions require configuration (LiteLLM), framework lock-in (langchain-smart-cache), or don't provide transparent proxy functionality for agent builders.

Features

✨ Drop-in Replacement: Simply change your API URL and get automatic caching 📊 ROI Analytics: Detailed cost savings and break-even analysis via headers 🎯 Smart Caching: Intelligent placement of cache breakpoints using multiple strategies ⚡ High Performance: Supports both streaming and non-streaming requests 🔧 Configurable: Multiple caching strategies and customizable thresholds 🐳 Docker Ready: Easy deployment with Docker and docker-compose 📋 Comprehensive Logging: Detailed request/response logging with structured output

Getting Started with Docker

The fastest way to start using Autocache is with the published Docker image from GitHub Container Registry:

Quick Start (30 seconds)

1. Run the container:

# Option A: With API key in environment variable
docker run -d -p 8080:8080 \
  -e ANTHROPIC_API_KEY=sk-ant-... \
  --name autocache \
  ghcr.io/montevive/autocache:latest

# Option B: Without API key (pass it per-request in headers)
docker run -d -p 8080:8080 \
  --name autocache \
  ghcr.io/montevive/autocache:latest

2. Verify it's running:

curl http://localhost:8080/health
# {"status":"healthy","version":"1.0.1","strategy":"moderate"}

3. Test with a request:

# If using Option A (env var), no header needed:
curl http://localhost:8080/v1/messages \
  -H "Content-Type: application/json" \
  -H "anthropic-version: 2023-06-01" \
  -d '{
    "model": "claude-3-5-haiku-20241022",
    "max_tokens": 50,
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

# If using Option B (no env var), pass API key in header:
curl http://localhost:8080/v1/messages \
  -H "Content-Type: application/json" \
  -H "x-api-key: sk-ant-..." \
  -H "anthropic-version: 2023-06-01" \
  -d '{
    "model": "claude-3-5-haiku-20241022",
    "max_tokens": 50,
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

4. Check the cache metadata in response headers:

X-Autocache-Injected: true
X-Autocache-Cache-Ratio: 0.750
X-Autocache-ROI-Percent: 85.2
X-Autocache-Savings-100req: $1.75

Available Docker Tags

latest - Latest stable release (recommended)
v1.0.1 - Specific version tag
1.0.1, 1.0, 1 - Semantic version aliases

Docker Image Details

Registry: ghcr.io/montevive/autocache
Architectures: linux/amd64, linux/arm64
Size: ~29 MB (optimized Alpine-based image)
Source: https://github.com/montevive/autocache

Next Steps

For production deployments with docker-compose, see the Quick Start section below
For configuration options, see Configuration
For n8n integration, see our n8n setup guide

Quick Start

Using Docker Compose (Recommended)

Clone and configure:

git clone <repository-url>
cd autocache
cp .env.example .env
# Edit .env with your ANTHROPIC_API_KEY (optional - can pass in headers instead)

Start the proxy:

docker-compose up -d

Use in your application:

# Change your API base URL from:
# https://api.anthropic.com
# To:
# http://localhost:8080

Direct Usage

Build and run:

go mod download
go build -o autocache ./cmd/autocache
# Option 1: Set API key via environment (optional)
ANTHROPIC_API_KEY=sk-ant-... ./autocache
# Option 2: Run without API key (pass it in request headers)
./autocache

Configure your client:

# Python example - API key passed in headers
import anthropic

client = anthropic.Anthropic(
    api_key="sk-ant-...",  # This will be forwarded to Anthropic
    base_url="http://localhost:8080"  # Point to autocache
)

Configuration

Environment Variables

| Variable | Default | Description | | ------------------------- | ------------ | -------------------------------------------------------------- | | PORT | 8080 | Server port | | ANTHROPIC_API_KEY | - | Your Anthropic API key (optional if passed in request headers) | | CACHE_STRATEGY | moderate | Caching strategy:conservative/moderate/aggressive | | LOG_LEVEL | info | Log level:debug/info/warn/error | | MAX_CACHE_BREAKPOINTS | 4 | Maximum cache breakpoints (1-4) | | TOKEN_MULTIPLIER | 1.0 | Token threshold multiplier |

API Key Configuration

The Anthropic API key can be provided in three ways (in order of precedence):

Request headers (recommended for multi-tenant scenarios):

Authorization: Bearer sk-ant-...
# or
x-api-key: sk-ant-...

Environment variable:

ANTHROPIC_API_KEY=sk-ant-... ./autocache

.env file:
```
ANT
```

Autocache

Install / Use

README

Autocache

Motivation

The Problem

Real User Pain Points

The Solution

Alternatives & Comparison

Existing Solutions

What Makes Autocache Different

Features

Getting Started with Docker

Quick Start (30 seconds)

Available Docker Tags

Docker Image Details

Next Steps

Quick Start

Using Docker Compose (Recommended)

Direct Usage

Configuration

Environment Variables

API Key Configuration