Ollama2OpenAI
Ollama的增强版OpenaAI/Anthropic格式API接口,支持更多参数/设置密钥/预更改模型参数和模型改名 An enhanced OpenAI/Anthropic-compatible gateway for Ollama with admin interface and advanced parameter control. Supports think/context params which ollama's OpenAI compitable endpoint doesn't, support params/model name overrides
Install / Use
/learn @MotorBottle/Ollama2OpenAIQuality Score
Category
Customer SupportSupported Platforms
README
Ollama2OpenAI Gateway
Languages: English | 简体中文
An enhanced OpenAI-compatible gateway for Ollama with admin interface and advanced parameter control.
🚀 Why Use This Instead of Ollama's Built-in OpenAI Endpoint?
- 🖼️ Multimodal Image Support - Full support for vision models with base64 and URL images in OpenAI format
- 🧠 Full Thinking Model Support - Complete
thinkparameter support with reasoning content in responses (not supported by Ollama's built-in endpoint) - ⚙️ Advanced Parameter Control - Set model-specific parameter overrides with full Ollama parameter support (
num_ctx,num_predict,think, etc.) - 🔑 Multi-API Key Management - Create and manage multiple API keys with per-key model access control
- 📊 Usage Tracking & Analytics - Comprehensive logging and monitoring of API usage
- 🎛️ Admin Web Interface - Easy configuration and management through a web dashboard
- 🏷️ Model Name Mapping - Custom display names for your models
Quick Start (Docker Only)
# Clone the repository
git clone https://github.com/MotorBottle/Ollama2OpenAI.git
cd Ollama2OpenAI
# Start the gateway (ensure OLLAMA_URL points at your Ollama host)
docker compose up -d
The compose file only starts the gateway container. Configure
OLLAMA_URLvia environment or.envso it can reach your existing Ollama instance. Stop the stack withdocker compose downwhen finished.
🎯 Access Admin Interface: http://localhost:3000
- Username: admin
- Password: admin
⚡ Quick Setup:
- Configure Ollama URL in Settings
- Refresh Models to load from Ollama
- Create API keys with model permissions
- Use OpenAI-compatible endpoint:
http://localhost:3000/v1/chat/completions
🖼️ Multimodal Image Support
Full support for vision models with images in OpenAI format:
from openai import OpenAI
import base64
client = OpenAI(
api_key="sk-your-api-key-here",
base_url="http://localhost:3000/v1"
)
# Using base64 encoded images
with open("image.jpg", "rb") as image_file:
base64_image = base64.b64encode(image_file.read()).decode('utf-8')
response = client.chat.completions.create(
model="llama3.2-vision:11b", # Or any vision model
messages=[{
"role": "user",
"content": [
{"type": "text", "text": "What's in this image?"},
{"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{base64_image}"}}
]
}]
)
# Also supports HTTP/HTTPS image URLs
response = client.chat.completions.create(
model="llama3.2-vision:11b",
messages=[{
"role": "user",
"content": [
{"type": "text", "text": "Describe this image"},
{"type": "image_url", "image_url": {"url": "https://example.com/image.jpg"}}
]
}]
)
Supported formats:
- ✅ Base64 encoded images (
data:image/jpeg;base64,...) - ✅ HTTP/HTTPS image URLs (automatically fetched and converted)
- ✅ Multiple images in a single message
- ✅ Works with both streaming and non-streaming responses
🧠 Enhanced Thinking Model Support
Unlike Ollama's built-in OpenAI endpoint, this gateway fully supports reasoning models:
from openai import OpenAI
client = OpenAI(
api_key="sk-your-api-key-here",
base_url="http://localhost:3000/v1"
)
# Full thinking model support with reasoning content and effort control
response = client.chat.completions.create(
model="gpt-oss:120b",
messages=[{"role": "user", "content": "Solve this math problem step by step"}],
reasoning_effort="high", # OpenAI format: "minimal", "low", "medium", "high"
# OR use OpenRouter format:
# reasoning={"effort": "high"}
num_ctx=32768 # Extended context
)
# Access reasoning content (not available in Ollama's OpenAI endpoint)
reasoning = response.choices[0].message.reasoning_content
answer = response.choices[0].message.content
🔍 Embeddings Support
Full OpenAI-compatible embeddings for similarity search and vector operations:
from openai import OpenAI
client = OpenAI(
api_key="sk-your-api-key-here",
base_url="http://localhost:3000/v1"
)
# Single text embedding
response = client.embeddings.create(
model="mxbai-embed-large", # Or any embedding model
input="The quick brown fox jumps over the lazy dog"
)
embedding = response.data[0].embedding
print(f"Embedding dimensions: {len(embedding)}")
# Multiple texts in one request
response = client.embeddings.create(
model="mxbai-embed-large",
input=[
"Hello world",
"How are you today?",
"This is a test document"
]
)
for i, embedding_obj in enumerate(response.data):
print(f"Text {i+1} embedding: {len(embedding_obj.embedding)} dimensions")
Supported features:
- ✅ Single and batch text processing
- ✅ Custom dimensions parameter (model dependent)
- ✅ Usage token tracking
- ✅ Full OpenAI client library compatibility
⚙️ Advanced Parameter Control
Set model-specific parameter overrides in the admin interface using Ollama format:
{
"deepseek-r1": {
"think": "high",
"num_ctx": 32768,
"temperature": 0.8,
"request_timeout": 600000
},
"llama3.2:3b": {
"num_ctx": 8192,
"num_predict": 1000
}
}
Parameter Precedence: User API params → Model overrides → System defaults
Parameter Overrides Examples (Ollama Format)
Add overrides in the admin UI (Models tab) using standard JSON:
{
"qwen3-coder": {
"num_ctx": 163840,
"request_timeout": 99999999,
"exclude_reasoning": true,
"think": true
}
}
request_timeout/timeout_msare in milliseconds. Set a high value to prevent long reasoning generations from hitting the default 120 s Axios timeout.exclude_reasoninghides reasoning content by default while still letting callers opt back in via request parameters.num_ctxexpands the context window for repositories or long chats.- Any Ollama
parameter(temperature, top_p, etc.) can be expressed here and is merged into the request automatically.
Environment Variables
# Create .env file for Docker
PORT=3000
OLLAMA_URL=http://localhost:11434 # or http://ollama:11434 for Docker
SESSION_SECRET=your-secret-key
Docker Commands
# Start/stop services
docker compose up -d
docker compose down
# View logs
docker compose logs -f gateway
# Rebuild after changes
docker compose up -d --build
API Endpoints
- POST
/v1/chat/completions- OpenAI-compatible chat completions with full Ollama parameter support - POST
/v1/embeddings- OpenAI-compatible embeddings for text similarity and search - POST
/v1/messages- Anthropic-compatible Messages API with thinking/tool streaming (legacy/anthropic/v1/messagesstill supported) - GET
/v1/models- List models (filtered by API key permissions) - Admin Interface -
http://localhost:3000for configuration and monitoring
🤖 Anthropic-Compatible API
Use the Anthropic Messages endpoint to serve Claude-style clients directly from Ollama:
curl http://localhost:3000/v1/messages \
-H "Content-Type: application/json" \
-H "X-API-Key: YOUR_KEY" \
-H "Anthropic-Version: 2023-06-01" \
-d '{
"model": "qwen3-coder",
"messages": [{"role": "user", "content": "Explain async/await in Python"}],
"stream": true,
"think": true
}'
Highlights:
- Streams
thinking_delta,signature_delta,text_delta, and tool blocks according to the latest Anthropic spec - Automatically maps Ollama tool calls to
tool_usecontent blocks and forwards tool call inputs back to your client - Supports
think/reasoning controls and per-model overrides (context, timeouts, etc.) - Works with Anthropic SDKs—specify the
Anthropic-Versionheader or accept the default2023-06-01
Provide tools in the Anthropic request (tools array) and the gateway will expose them to Ollama. When Ollama decides on a tool, the response streams back as Anthropic tool_use blocks with properly parsed JSON arguments, ready to execute in your application.
On the OpenAI side, keep using the standard tools / tool_calls fields in /v1/chat/completions. The gateway forwards those definitions to Ollama and converts the model's function calls back into OpenAI-compatible tool call payloads automatically.
Anthropic request with tools
curl http://localhost:3000/v1/messages \
-H "Content-Type: application/json" \
-H "X-API-Key: YOUR_KEY" \
-d '{
"model": "qwen3-coder",
"messages": [{"role": "user", "content": "查一下旧金山的天气"}],
"tools": [
{
"type": "function",
"function": {
"name": "get_weather",
"parameters": {
"type": "object",
"properties": {"city": {"type": "string"}},
"required": ["city"]
}
}
}
]
}'
When the model invokes a tool you’ll receive a streamed block such as:
event: content_block_start
data: {"type":"content_block_start","index":1,"content_block":{"type":"tool_use","id":"toolu_01...","name":"get_weather","input":{"city":"旧金山"}}}
OpenAI-compatible example (Python)
from openai import OpenAI
client = OpenAI(api_key="YOUR_KEY", base_url="http://localhost:3000/v1")
response = client.chat.completions.create(
model="qwen3-coder",
messages=[{"role": "user", "content": "Call the lookup tool for Paris"}],
tools=[{
"type": "function",
"function": {
"name": "lookup_city",
