Ollama2OpenAI Gateway

Languages: English | 简体中文

An enhanced OpenAI-compatible gateway for Ollama with admin interface and advanced parameter control.

🚀 Why Use This Instead of Ollama's Built-in OpenAI Endpoint?

🖼️ Multimodal Image Support - Full support for vision models with base64 and URL images in OpenAI format
🧠 Full Thinking Model Support - Complete think parameter support with reasoning content in responses (not supported by Ollama's built-in endpoint)
⚙️ Advanced Parameter Control - Set model-specific parameter overrides with full Ollama parameter support (num_ctx, num_predict, think, etc.)
🔑 Multi-API Key Management - Create and manage multiple API keys with per-key model access control
📊 Usage Tracking & Analytics - Comprehensive logging and monitoring of API usage
🎛️ Admin Web Interface - Easy configuration and management through a web dashboard
🏷️ Model Name Mapping - Custom display names for your models

Quick Start (Docker Only)

# Clone the repository
git clone https://github.com/MotorBottle/Ollama2OpenAI.git
cd Ollama2OpenAI

# Start the gateway (ensure OLLAMA_URL points at your Ollama host)
docker compose up -d

The compose file only starts the gateway container. Configure OLLAMA_URL via environment or .env so it can reach your existing Ollama instance. Stop the stack with docker compose down when finished.

🎯 Access Admin Interface: http://localhost:3000

Username: admin
Password: admin

⚡ Quick Setup:

Configure Ollama URL in Settings
Refresh Models to load from Ollama
Create API keys with model permissions
Use OpenAI-compatible endpoint: http://localhost:3000/v1/chat/completions

🖼️ Multimodal Image Support

Full support for vision models with images in OpenAI format:

from openai import OpenAI
import base64

client = OpenAI(
    api_key="sk-your-api-key-here",
    base_url="http://localhost:3000/v1"
)

# Using base64 encoded images
with open("image.jpg", "rb") as image_file:
    base64_image = base64.b64encode(image_file.read()).decode('utf-8')

response = client.chat.completions.create(
    model="llama3.2-vision:11b",  # Or any vision model
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "What's in this image?"},
            {"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{base64_image}"}}
        ]
    }]
)

# Also supports HTTP/HTTPS image URLs
response = client.chat.completions.create(
    model="llama3.2-vision:11b",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "Describe this image"},
            {"type": "image_url", "image_url": {"url": "https://example.com/image.jpg"}}
        ]
    }]
)

Supported formats:

✅ Base64 encoded images (data:image/jpeg;base64,...)
✅ HTTP/HTTPS image URLs (automatically fetched and converted)
✅ Multiple images in a single message
✅ Works with both streaming and non-streaming responses

🧠 Enhanced Thinking Model Support

Unlike Ollama's built-in OpenAI endpoint, this gateway fully supports reasoning models:

from openai import OpenAI

client = OpenAI(
    api_key="sk-your-api-key-here",
    base_url="http://localhost:3000/v1"
)

# Full thinking model support with reasoning content and effort control
response = client.chat.completions.create(
    model="gpt-oss:120b",
    messages=[{"role": "user", "content": "Solve this math problem step by step"}],
    reasoning_effort="high",  # OpenAI format: "minimal", "low", "medium", "high"
    # OR use OpenRouter format:
    # reasoning={"effort": "high"}
    num_ctx=32768  # Extended context
)

# Access reasoning content (not available in Ollama's OpenAI endpoint)
reasoning = response.choices[0].message.reasoning_content
answer = response.choices[0].message.content

🔍 Embeddings Support

Full OpenAI-compatible embeddings for similarity search and vector operations:

from openai import OpenAI

client = OpenAI(
    api_key="sk-your-api-key-here",
    base_url="http://localhost:3000/v1"
)

# Single text embedding
response = client.embeddings.create(
    model="mxbai-embed-large",  # Or any embedding model
    input="The quick brown fox jumps over the lazy dog"
)

embedding = response.data[0].embedding
print(f"Embedding dimensions: {len(embedding)}")

# Multiple texts in one request
response = client.embeddings.create(
    model="mxbai-embed-large",
    input=[
        "Hello world",
        "How are you today?",
        "This is a test document"
    ]
)

for i, embedding_obj in enumerate(response.data):
    print(f"Text {i+1} embedding: {len(embedding_obj.embedding)} dimensions")

Supported features:

✅ Single and batch text processing
✅ Custom dimensions parameter (model dependent)
✅ Usage token tracking
✅ Full OpenAI client library compatibility

⚙️ Advanced Parameter Control

Set model-specific parameter overrides in the admin interface using Ollama format:

{
  "deepseek-r1": {
    "think": "high",
    "num_ctx": 32768,
    "temperature": 0.8,
    "request_timeout": 600000
  },
  "llama3.2:3b": {
    "num_ctx": 8192,
    "num_predict": 1000
  }
}

Parameter Precedence: User API params → Model overrides → System defaults

Parameter Overrides Examples (Ollama Format)

Add overrides in the admin UI (Models tab) using standard JSON:

{
  "qwen3-coder": {
    "num_ctx": 163840,
    "request_timeout": 99999999,
    "exclude_reasoning": true,
    "think": true
  }
}

request_timeout / timeout_ms are in milliseconds. Set a high value to prevent long reasoning generations from hitting the default 120 s Axios timeout.
exclude_reasoning hides reasoning content by default while still letting callers opt back in via request parameters.
num_ctx expands the context window for repositories or long chats.
Any Ollama parameter (temperature, top_p, etc.) can be expressed here and is merged into the request automatically.

Environment Variables

# Create .env file for Docker
PORT=3000
OLLAMA_URL=http://localhost:11434  # or http://ollama:11434 for Docker
SESSION_SECRET=your-secret-key

Docker Commands

# Start/stop services
docker compose up -d
docker compose down

# View logs
docker compose logs -f gateway

# Rebuild after changes  
docker compose up -d --build

API Endpoints

POST /v1/chat/completions - OpenAI-compatible chat completions with full Ollama parameter support
POST /v1/embeddings - OpenAI-compatible embeddings for text similarity and search
POST /v1/messages - Anthropic-compatible Messages API with thinking/tool streaming (legacy /anthropic/v1/messages still supported)
GET /v1/models - List models (filtered by API key permissions)
Admin Interface - http://localhost:3000 for configuration and monitoring

🤖 Anthropic-Compatible API

Use the Anthropic Messages endpoint to serve Claude-style clients directly from Ollama:

curl http://localhost:3000/v1/messages \
  -H "Content-Type: application/json" \
  -H "X-API-Key: YOUR_KEY" \
  -H "Anthropic-Version: 2023-06-01" \
  -d '{
    "model": "qwen3-coder",
    "messages": [{"role": "user", "content": "Explain async/await in Python"}],
    "stream": true,
    "think": true
  }'

Highlights:

Streams thinking_delta, signature_delta, text_delta, and tool blocks according to the latest Anthropic spec
Automatically maps Ollama tool calls to tool_use content blocks and forwards tool call inputs back to your client
Supports think/reasoning controls and per-model overrides (context, timeouts, etc.)
Works with Anthropic SDKs—specify the Anthropic-Version header or accept the default 2023-06-01

Provide tools in the Anthropic request (tools array) and the gateway will expose them to Ollama. When Ollama decides on a tool, the response streams back as Anthropic tool_use blocks with properly parsed JSON arguments, ready to execute in your application.

On the OpenAI side, keep using the standard tools / tool_calls fields in /v1/chat/completions. The gateway forwards those definitions to Ollama and converts the model's function calls back into OpenAI-compatible tool call payloads automatically.

Anthropic request with tools

curl http://localhost:3000/v1/messages \
  -H "Content-Type: application/json" \
  -H "X-API-Key: YOUR_KEY" \
  -d '{
    "model": "qwen3-coder",
    "messages": [{"role": "user", "content": "查一下旧金山的天气"}],
    "tools": [
      {
        "type": "function",
        "function": {
          "name": "get_weather",
          "parameters": {
            "type": "object",
            "properties": {"city": {"type": "string"}},
            "required": ["city"]
          }
        }
      }
    ]
  }'

When the model invokes a tool you’ll receive a streamed block such as:

event: content_block_start
data: {"type":"content_block_start","index":1,"content_block":{"type":"tool_use","id":"toolu_01...","name":"get_weather","input":{"city":"旧金山"}}}

OpenAI-compatible example (Python)

from openai import OpenAI

client = OpenAI(api_key="YOUR_KEY", base_url="http://localhost:3000/v1")

response = client.chat.completions.create(
    model="qwen3-coder",
    messages=[{"role": "user", "content": "Call the lookup tool for Paris"}],
    tools=[{
        "type": "function",
        "function": {
            "name": "lookup_city",

Ollama2OpenAI

Install / Use

README