Optillm
Optimizing inference proxy for LLMs
Install / Use
/learn @algorithmicsuperintelligence/OptillmREADME
OptiLLM
<p align="center"> <img src="optillm-logo.png" alt="OptiLLM Logo" width="400" /> </p> <p align="center"> <strong>🚀 2-10x accuracy improvements on reasoning tasks with zero training</strong> </p> <p align="center"> <a href="https://github.com/algorithmicsuperintelligence/optillm/stargazers"><img src="https://img.shields.io/github/stars/algorithmicsuperintelligence/optillm?style=social" alt="GitHub stars"></a> <a href="https://pypi.org/project/optillm/"><img src="https://img.shields.io/pypi/v/optillm" alt="PyPI version"></a> <a href="https://pypi.org/project/optillm/"><img src="https://img.shields.io/pypi/dm/optillm" alt="PyPI downloads"></a> <a href="https://github.com/algorithmicsuperintelligence/optillm/blob/main/LICENSE"><img src="https://img.shields.io/github/license/algorithmicsuperintelligence/optillm" alt="License"></a> </p> <p align="center"> <a href="https://huggingface.co/spaces/codelion/optillm">🤗 HuggingFace Space</a> • <a href="https://colab.research.google.com/drive/1SpuUb8d9xAoTh32M-9wJsB50AOH54EaH?usp=sharing">📓 Colab Demo</a> • <a href="https://github.com/algorithmicsuperintelligence/optillm/discussions">💬 Discussions</a> </p>OptiLLM is an OpenAI API-compatible optimizing inference proxy that implements 20+ state-of-the-art techniques to dramatically improve LLM accuracy and performance on reasoning tasks - without requiring any model training or fine-tuning.
It is possible to beat the frontier models using these techniques across diverse tasks by doing additional compute at inference time. A good example of how to combine such techniques together is the CePO approach from Cerebras.
✨ Key Features
- 🎯 Instant Improvements: 2-10x better accuracy on math, coding, and logical reasoning
- 🔌 Drop-in Replacement: Works with any OpenAI-compatible API endpoint
- 🧠 20+ Optimization Techniques: From simple best-of-N to advanced MCTS and planning
- 📦 Zero Training Required: Just proxy your existing API calls through OptiLLM
- ⚡ Production Ready: Used in production by companies and researchers worldwide
- 🌍 Multi-Provider: Supports OpenAI, Anthropic, Google, Cerebras, and 100+ models via LiteLLM
🚀 Quick Start
Get powerful reasoning improvements in 3 simple steps:
# 1. Install OptiLLM
pip install optillm
# 2. Start the server
export OPENAI_API_KEY="your-key-here"
optillm
# 3. Use with any OpenAI client - just change the model name!
from openai import OpenAI
client = OpenAI(base_url="http://localhost:8000/v1")
# Add 'moa-' prefix for Mixture of Agents optimization
response = client.chat.completions.create(
model="moa-gpt-4o-mini", # This gives you GPT-4o performance from GPT-4o-mini!
messages=[{"role": "user", "content": "Solve: If 2x + 3 = 7, what is x?"}]
)
Before OptiLLM: "x = 1" ❌
After OptiLLM: "Let me work through this step by step: 2x + 3 = 7, so 2x = 4, therefore x = 2" ✅
📊 Proven Results
OptiLLM delivers measurable improvements across diverse benchmarks:
| Technique | Base Model | Improvement | Benchmark | |-----------|------------|-------------|-----------| | MARS | Gemini 2.5 Flash Lite | +30.0 points | AIME 2025 (43.3→73.3) | | CePO | Llama 3.3 70B | +18.6 points | Math-L5 (51.0→69.6) | | AutoThink | DeepSeek-R1-1.5B | +9.34 points | GPQA-Diamond (21.72→31.06) | | LongCePO | Llama 3.3 70B | +13.6 points | InfiniteBench (58.0→71.6) | | MOA | GPT-4o-mini | Matches GPT-4 | Arena-Hard-Auto | | PlanSearch | GPT-4o-mini | +20% pass@5 | LiveCodeBench |
Full benchmark results below ⬇️
🏗️ Installation
Using pip
pip install optillm
optillm
2024-10-22 07:45:05,612 - INFO - Loaded plugin: privacy
2024-10-22 07:45:06,293 - INFO - Loaded plugin: memory
2024-10-22 07:45:06,293 - INFO - Starting server with approach: auto
Using docker
docker pull ghcr.io/algorithmicsuperintelligence/optillm:latest
docker run -p 8000:8000 ghcr.io/algorithmicsuperintelligence/optillm:latest
2024-10-22 07:45:05,612 - INFO - Loaded plugin: privacy
2024-10-22 07:45:06,293 - INFO - Loaded plugin: memory
2024-10-22 07:45:06,293 - INFO - Starting server with approach: auto
Available Docker image variants:
- Full image (
latest): Includes all dependencies for local inference and plugins - Proxy-only (
latest-proxy): Lightweight image without local inference capabilities - Offline (
latest-offline): Self-contained image with pre-downloaded models (spaCy) for fully offline operation
# Proxy-only (smallest)
docker pull ghcr.io/algorithmicsuperintelligence/optillm:latest-proxy
# Offline (largest, includes pre-downloaded models)
docker pull ghcr.io/algorithmicsuperintelligence/optillm:latest-offline
Install from source
Clone the repository with git and use pip install to setup the dependencies.
git clone https://github.com/algorithmicsuperintelligence/optillm.git
cd optillm
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
🔒 SSL Configuration
OptILLM supports SSL certificate verification configuration for working with self-signed certificates or corporate proxies.
Disable SSL verification (development only):
# Command line
optillm --no-ssl-verify
# Environment variable
export OPTILLM_SSL_VERIFY=false
optillm
Use custom CA certificate:
# Command line
optillm --ssl-cert-path /path/to/ca-bundle.crt
# Environment variable
export OPTILLM_SSL_CERT_PATH=/path/to/ca-bundle.crt
optillm
⚠️ Security Note: Disabling SSL verification is insecure and should only be used in development. For production environments with custom CAs, use --ssl-cert-path instead. See SSL_CONFIGURATION.md for details.
Implemented techniques
| Approach | Slug | Description |
| ------------------------------------ | ------------------ | ---------------------------------------------------------------------------------------------- |
| MARS (Multi-Agent Reasoning System) | mars | Multi-agent reasoning with diverse temperature exploration, cross-verification, and iterative improvement |
| Cerebras Planning and Optimization | cepo | Combines Best of N, Chain-of-Thought, Self-Reflection, Self-Improvement, and various prompting techniques |
| CoT with Reflection | cot_reflection | Implements chain-of-thought reasoning with <thinking>, <reflection> and <output> sections |
| PlanSearch | plansearch | Implements a search algorithm over candidate plans for solving a problem in natural language |
| ReRead | re2 | Implements rereading to improve reasoning by processing queries twice |
| Self-Consistency | self_consistency | Implements an advanced self-consistency method |
| Z3 Solver | z3 | Utilizes the Z3 theorem prover for logical reasoning |
| R* Algorithm | rstar | Implements the R* algorithm for problem-solving |
| LEAP | leap | Learns task-specific principles from few shot examples |
| Round Trip Optimization | rto | Optimizes responses through a round-trip process |
| Best of N Sampling | bon | Generates multiple responses and selects the best one |
| Mixture of Agents | moa | Combines responses from multiple critiques |
| Monte Carlo Tree Search | mcts | Uses MCTS for decision-making in chat responses |
| PV Game | pvg | Applies a prover-verifier game approach at inference time |
| Deep Confidence | N/A for proxy | Implements confidence-guided reasoning with multiple intensity levels for enhanced accuracy |
| CoT Decoding | N/A for proxy | Implements chain-of-thought decoding to elicit reasoning without explicit prompting |
| Entropy Decoding | N/A for proxy | Implements adaptive sampling based on the uncertainty of tokens during generation |
| Thinkdeeper | N/A for proxy | Implements the reasoning_effort param from OpenAI for reasoning models like DeepSeek R1 |
| AutoThink | N/A for proxy | Combines query complexity classification with steering vectors to enhance reasoning |
Implemented plugins
| Plugin | Slug | Description |
| ----------------------- | ------------------ | ---------------------------------------------------------------------------------------------- |
| System Prompt Learning | spl | Implements what Andrej Karpathy called the third paradigm for LLM learning, this enables the model to acquire program solving knowledge and strategies |
| Deep Think | deepthink | Imp
Related Skills
node-connect
329.7kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
81.2kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
329.7kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
commit-push-pr
81.2kCommit, push, and open a PR
