SkillAgentSearch skills...

Autocache

🚀 Autocache - Intelligent Anthropic API Cache Proxy Automatically inject cache-control fields into Claude API requests to reduce costs by up to 90% and latency by up to 85%. Works as a transparent drop-in replacement for popular AI platforms like n8n, Flowise, Make.com, LangChain, and LlamaIndex—no code changes required

Install / Use

/learn @montevive/Autocache
About this skill

Quality Score

0/100

Supported Platforms

Claude Code
Claude Desktop

README

<div align="center"> <img src="media/logo-autocache.png" alt="AutoCache Logo" width="400"/>

Autocache

Intelligent Anthropic API Cache Proxy with ROI Analytics

License: MIT Go Version Tests

</div>

Autocache is a smart proxy server that automatically injects cache-control fields into Anthropic Claude API requests, reducing costs by up to 90% and latency by up to 85% while providing detailed ROI analytics via response headers.

Motivation

Modern AI agent platforms like n8n, Flowise, Make.com, and even popular frameworks like LangChain and LlamaIndex don't support Anthropic's prompt caching—despite users building increasingly complex agents with:

  • 📝 Large system prompts (1,000-5,000+ tokens)
  • 🛠️ 10+ tool definitions (5,000-15,000+ tokens)
  • 🔄 Repeated agent interactions (same context, different queries)

The Problem

When you build a complex agent in n8n with a detailed system prompt and multiple tools, every API call sends the full context again—costing 10x more than necessary. For example:

  • Without caching: 15,000 token agent → $0.045 per request
  • With caching: Same agent → $0.0045 per request after first call (90% savings)

Real User Pain Points

The AI community has been requesting this feature:

The Solution

Autocache works as a transparent proxy that automatically analyzes your requests and injects cache-control headers at optimal breakpoints—no code changes required. Just point your existing n8n/Flowise/Make.com workflows to Autocache instead of directly to Anthropic's API.

Result: Same agents, 90% lower costs, 85% lower latency—automatically.

Alternatives & Comparison

Several tools offer prompt caching support, but Autocache is unique in combining zero-config transparent proxy with intelligent ROI analytics:

Existing Solutions

| Solution | Type | Auto-Injection | Intelligence | ROI Analytics | Drop-in for n8n/Flowise | | ------------------------------------------------------------------------------------ | ------- | ----------------------- | ------------------------------- | ------------------- | ----------------------- | | Autocache | Proxy | ✅ Fully automatic | ✅ Token analysis + ROI scoring | ✅ Response headers | ✅ Yes | | LiteLLM | Proxy | ⚠️ Requires config | ❌ Rule-based | ❌ No | ✅ Yes | | langchain-smart-cache | Library | ✅ Fully automatic | ✅ Priority-based | ✅ Statistics | ❌ LangChain only | | anthropic-cost-tracker | Library | ❓ Unclear | ❓ Unknown | ✅ Dashboard | ❌ Python only | | OpenRouter | Service | ⚠️ Provider-dependent | ❌ No | ❌ No | ✅ Yes | | AWS Bedrock | Cloud | ✅ ML-based | ✅ Yes | ✅ AWS only | ❌ AWS only |

What Makes Autocache Different

Autocache is the only solution that combines:

  1. 🔄 Transparent Proxy - Works with any tool (n8n, Flowise, Make.com) without code changes
  2. 🧠 Intelligent Analysis - Automatic token counting, ROI scoring, and optimal breakpoint placement
  3. 📊 Real-time ROI - Cost savings and break-even analysis in every response header
  4. 🏠 Self-Hosted - No external dependencies or cloud vendor lock-in
  5. ⚙️ Zero Config - Works out of the box with multiple strategies (conservative/moderate/aggressive)

Other solutions require configuration (LiteLLM), framework lock-in (langchain-smart-cache), or don't provide transparent proxy functionality for agent builders.

Features

Drop-in Replacement: Simply change your API URL and get automatic caching 📊 ROI Analytics: Detailed cost savings and break-even analysis via headers 🎯 Smart Caching: Intelligent placement of cache breakpoints using multiple strategies ⚡ High Performance: Supports both streaming and non-streaming requests 🔧 Configurable: Multiple caching strategies and customizable thresholds 🐳 Docker Ready: Easy deployment with Docker and docker-compose 📋 Comprehensive Logging: Detailed request/response logging with structured output

Getting Started with Docker

The fastest way to start using Autocache is with the published Docker image from GitHub Container Registry:

Quick Start (30 seconds)

1. Run the container:

# Option A: With API key in environment variable
docker run -d -p 8080:8080 \
  -e ANTHROPIC_API_KEY=sk-ant-... \
  --name autocache \
  ghcr.io/montevive/autocache:latest

# Option B: Without API key (pass it per-request in headers)
docker run -d -p 8080:8080 \
  --name autocache \
  ghcr.io/montevive/autocache:latest

2. Verify it's running:

curl http://localhost:8080/health
# {"status":"healthy","version":"1.0.1","strategy":"moderate"}

3. Test with a request:

# If using Option A (env var), no header needed:
curl http://localhost:8080/v1/messages \
  -H "Content-Type: application/json" \
  -H "anthropic-version: 2023-06-01" \
  -d '{
    "model": "claude-3-5-haiku-20241022",
    "max_tokens": 50,
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

# If using Option B (no env var), pass API key in header:
curl http://localhost:8080/v1/messages \
  -H "Content-Type: application/json" \
  -H "x-api-key: sk-ant-..." \
  -H "anthropic-version: 2023-06-01" \
  -d '{
    "model": "claude-3-5-haiku-20241022",
    "max_tokens": 50,
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

4. Check the cache metadata in response headers:

X-Autocache-Injected: true
X-Autocache-Cache-Ratio: 0.750
X-Autocache-ROI-Percent: 85.2
X-Autocache-Savings-100req: $1.75

Available Docker Tags

  • latest - Latest stable release (recommended)
  • v1.0.1 - Specific version tag
  • 1.0.1, 1.0, 1 - Semantic version aliases

Docker Image Details

  • Registry: ghcr.io/montevive/autocache
  • Architectures: linux/amd64, linux/arm64
  • Size: ~29 MB (optimized Alpine-based image)
  • Source: https://github.com/montevive/autocache

Next Steps

Quick Start

Using Docker Compose (Recommended)

  1. Clone and configure:
git clone <repository-url>
cd autocache
cp .env.example .env
# Edit .env with your ANTHROPIC_API_KEY (optional - can pass in headers instead)
  1. Start the proxy:
docker-compose up -d
  1. Use in your application:
# Change your API base URL from:
# https://api.anthropic.com
# To:
# http://localhost:8080

Direct Usage

  1. Build and run:
go mod download
go build -o autocache ./cmd/autocache
# Option 1: Set API key via environment (optional)
ANTHROPIC_API_KEY=sk-ant-... ./autocache
# Option 2: Run without API key (pass it in request headers)
./autocache
  1. Configure your client:
# Python example - API key passed in headers
import anthropic

client = anthropic.Anthropic(
    api_key="sk-ant-...",  # This will be forwarded to Anthropic
    base_url="http://localhost:8080"  # Point to autocache
)

Configuration

Environment Variables

| Variable | Default | Description | | ------------------------- | ------------ | -------------------------------------------------------------- | | PORT | 8080 | Server port | | ANTHROPIC_API_KEY | - | Your Anthropic API key (optional if passed in request headers) | | CACHE_STRATEGY | moderate | Caching strategy:conservative/moderate/aggressive | | LOG_LEVEL | info | Log level:debug/info/warn/error | | MAX_CACHE_BREAKPOINTS | 4 | Maximum cache breakpoints (1-4) | | TOKEN_MULTIPLIER | 1.0 | Token threshold multiplier |

API Key Configuration

The Anthropic API key can be provided in three ways (in order of precedence):

  1. Request headers (recommended for multi-tenant scenarios):

    Authorization: Bearer sk-ant-...
    # or
    x-api-key: sk-ant-...
    
  2. Environment variable:

    ANTHROPIC_API_KEY=sk-ant-... ./autocache
    
  3. .env file:

    ANT
    
View on GitHub
GitHub Stars68
CategoryDevelopment
Updated1h ago
Forks8

Languages

Go

Security Score

100/100

Audited on Apr 4, 2026

No findings