SkillAgentSearch skills...

Optillm

Optimizing inference proxy for LLMs

Install / Use

/learn @algorithmicsuperintelligence/Optillm

README

OptiLLM

<p align="center"> <img src="optillm-logo.png" alt="OptiLLM Logo" width="400" /> </p> <p align="center"> <strong>🚀 2-10x accuracy improvements on reasoning tasks with zero training</strong> </p> <p align="center"> <a href="https://github.com/algorithmicsuperintelligence/optillm/stargazers"><img src="https://img.shields.io/github/stars/algorithmicsuperintelligence/optillm?style=social" alt="GitHub stars"></a> <a href="https://pypi.org/project/optillm/"><img src="https://img.shields.io/pypi/v/optillm" alt="PyPI version"></a> <a href="https://pypi.org/project/optillm/"><img src="https://img.shields.io/pypi/dm/optillm" alt="PyPI downloads"></a> <a href="https://github.com/algorithmicsuperintelligence/optillm/blob/main/LICENSE"><img src="https://img.shields.io/github/license/algorithmicsuperintelligence/optillm" alt="License"></a> </p> <p align="center"> <a href="https://huggingface.co/spaces/codelion/optillm">🤗 HuggingFace Space</a> • <a href="https://colab.research.google.com/drive/1SpuUb8d9xAoTh32M-9wJsB50AOH54EaH?usp=sharing">📓 Colab Demo</a> • <a href="https://github.com/algorithmicsuperintelligence/optillm/discussions">💬 Discussions</a> </p>

OptiLLM is an OpenAI API-compatible optimizing inference proxy that implements 20+ state-of-the-art techniques to dramatically improve LLM accuracy and performance on reasoning tasks - without requiring any model training or fine-tuning.

It is possible to beat the frontier models using these techniques across diverse tasks by doing additional compute at inference time. A good example of how to combine such techniques together is the CePO approach from Cerebras.

✨ Key Features

  • 🎯 Instant Improvements: 2-10x better accuracy on math, coding, and logical reasoning
  • 🔌 Drop-in Replacement: Works with any OpenAI-compatible API endpoint
  • 🧠 20+ Optimization Techniques: From simple best-of-N to advanced MCTS and planning
  • 📦 Zero Training Required: Just proxy your existing API calls through OptiLLM
  • ⚡ Production Ready: Used in production by companies and researchers worldwide
  • 🌍 Multi-Provider: Supports OpenAI, Anthropic, Google, Cerebras, and 100+ models via LiteLLM

🚀 Quick Start

Get powerful reasoning improvements in 3 simple steps:

# 1. Install OptiLLM
pip install optillm

# 2. Start the server
export OPENAI_API_KEY="your-key-here"
optillm

# 3. Use with any OpenAI client - just change the model name!
from openai import OpenAI

client = OpenAI(base_url="http://localhost:8000/v1")

# Add 'moa-' prefix for Mixture of Agents optimization
response = client.chat.completions.create(
    model="moa-gpt-4o-mini",  # This gives you GPT-4o performance from GPT-4o-mini!
    messages=[{"role": "user", "content": "Solve: If 2x + 3 = 7, what is x?"}]
)

Before OptiLLM: "x = 1" ❌
After OptiLLM: "Let me work through this step by step: 2x + 3 = 7, so 2x = 4, therefore x = 2" ✅

📊 Proven Results

OptiLLM delivers measurable improvements across diverse benchmarks:

| Technique | Base Model | Improvement | Benchmark | |-----------|------------|-------------|-----------| | MARS | Gemini 2.5 Flash Lite | +30.0 points | AIME 2025 (43.3→73.3) | | CePO | Llama 3.3 70B | +18.6 points | Math-L5 (51.0→69.6) | | AutoThink | DeepSeek-R1-1.5B | +9.34 points | GPQA-Diamond (21.72→31.06) | | LongCePO | Llama 3.3 70B | +13.6 points | InfiniteBench (58.0→71.6) | | MOA | GPT-4o-mini | Matches GPT-4 | Arena-Hard-Auto | | PlanSearch | GPT-4o-mini | +20% pass@5 | LiveCodeBench |

Full benchmark results below ⬇️

🏗️ Installation

Using pip

pip install optillm
optillm
2024-10-22 07:45:05,612 - INFO - Loaded plugin: privacy
2024-10-22 07:45:06,293 - INFO - Loaded plugin: memory
2024-10-22 07:45:06,293 - INFO - Starting server with approach: auto

Using docker

docker pull ghcr.io/algorithmicsuperintelligence/optillm:latest
docker run -p 8000:8000 ghcr.io/algorithmicsuperintelligence/optillm:latest
2024-10-22 07:45:05,612 - INFO - Loaded plugin: privacy
2024-10-22 07:45:06,293 - INFO - Loaded plugin: memory
2024-10-22 07:45:06,293 - INFO - Starting server with approach: auto

Available Docker image variants:

  • Full image (latest): Includes all dependencies for local inference and plugins
  • Proxy-only (latest-proxy): Lightweight image without local inference capabilities
  • Offline (latest-offline): Self-contained image with pre-downloaded models (spaCy) for fully offline operation
# Proxy-only (smallest)
docker pull ghcr.io/algorithmicsuperintelligence/optillm:latest-proxy

# Offline (largest, includes pre-downloaded models)
docker pull ghcr.io/algorithmicsuperintelligence/optillm:latest-offline

Install from source

Clone the repository with git and use pip install to setup the dependencies.

git clone https://github.com/algorithmicsuperintelligence/optillm.git
cd optillm
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

🔒 SSL Configuration

OptILLM supports SSL certificate verification configuration for working with self-signed certificates or corporate proxies.

Disable SSL verification (development only):

# Command line
optillm --no-ssl-verify

# Environment variable
export OPTILLM_SSL_VERIFY=false
optillm

Use custom CA certificate:

# Command line
optillm --ssl-cert-path /path/to/ca-bundle.crt

# Environment variable
export OPTILLM_SSL_CERT_PATH=/path/to/ca-bundle.crt
optillm

⚠️ Security Note: Disabling SSL verification is insecure and should only be used in development. For production environments with custom CAs, use --ssl-cert-path instead. See SSL_CONFIGURATION.md for details.

Implemented techniques

| Approach | Slug | Description | | ------------------------------------ | ------------------ | ---------------------------------------------------------------------------------------------- | | MARS (Multi-Agent Reasoning System) | mars | Multi-agent reasoning with diverse temperature exploration, cross-verification, and iterative improvement | | Cerebras Planning and Optimization | cepo | Combines Best of N, Chain-of-Thought, Self-Reflection, Self-Improvement, and various prompting techniques | | CoT with Reflection | cot_reflection | Implements chain-of-thought reasoning with <thinking>, <reflection> and <output> sections | | PlanSearch | plansearch | Implements a search algorithm over candidate plans for solving a problem in natural language | | ReRead | re2 | Implements rereading to improve reasoning by processing queries twice | | Self-Consistency | self_consistency | Implements an advanced self-consistency method | | Z3 Solver | z3 | Utilizes the Z3 theorem prover for logical reasoning | | R* Algorithm | rstar | Implements the R* algorithm for problem-solving | | LEAP | leap | Learns task-specific principles from few shot examples | | Round Trip Optimization | rto | Optimizes responses through a round-trip process | | Best of N Sampling | bon | Generates multiple responses and selects the best one | | Mixture of Agents | moa | Combines responses from multiple critiques | | Monte Carlo Tree Search | mcts | Uses MCTS for decision-making in chat responses | | PV Game | pvg | Applies a prover-verifier game approach at inference time | | Deep Confidence | N/A for proxy | Implements confidence-guided reasoning with multiple intensity levels for enhanced accuracy | | CoT Decoding | N/A for proxy | Implements chain-of-thought decoding to elicit reasoning without explicit prompting | | Entropy Decoding | N/A for proxy | Implements adaptive sampling based on the uncertainty of tokens during generation | | Thinkdeeper | N/A for proxy | Implements the reasoning_effort param from OpenAI for reasoning models like DeepSeek R1 | | AutoThink | N/A for proxy | Combines query complexity classification with steering vectors to enhance reasoning |

Implemented plugins

| Plugin | Slug | Description | | ----------------------- | ------------------ | ---------------------------------------------------------------------------------------------- | | System Prompt Learning | spl | Implements what Andrej Karpathy called the third paradigm for LLM learning, this enables the model to acquire program solving knowledge and strategies | | Deep Think | deepthink | Imp

Related Skills

View on GitHub
GitHub Stars3.4k
CategoryDevelopment
Updated7h ago
Forks264

Languages

Python

Security Score

100/100

Audited on Mar 22, 2026

No findings