OllamaMQ
High-performance Ollama proxy with per-user fair-share queuing, round-robin scheduling, and a real-time TUI dashboard. Built in Rust.
Install / Use
/learn @Chleba/OllamaMQREADME
ollamaMQ
ollamaMQ is a high-performance, asynchronous message queue dispatcher and load balancer designed to sit in front of one or more Ollama API instances. It acts as a smart proxy that queues incoming requests from multiple users and dispatches them in parallel to multiple Ollama backends using a fair-share round-robin scheduler with least-connections load balancing.
🚀 Features
- Multi-Backend Load Balancing: Distribute requests across multiple Ollama instances using a Least Connections + Round Robin strategy.
- Parallel Processing: Unlike basic proxies,
ollamaMQcan process multiple requests simultaneously (one per available backend), significantly increasing throughput for multiple users. - Backend Health Checks: Automatically monitors backend status every 10 seconds; offline instances are temporarily skipped and marked in the TUI.
- Per-User Queuing: Each user (identified by the
X-User-IDheader) has their own FIFO queue. - Fair-Share Scheduling: Prevents any single user from monopolizing all available backends.
- Transparent Header Forwarding: Full support for all HTTP headers (including
X-User-ID) passed to and from Ollama, ensuring compatibility with tools like Claude Code. - VIP & Boost Modes: Absolute priority (VIP) or increased frequency (Boost) for specific users.
- Real-Time TUI Dashboard: Monitor backend health, active requests, queue depths, and throughput in real-time.
- OpenAI Compatibility: Supports standard OpenAI-compatible endpoints.
- Async Architecture: Built on
tokioandaxumfor high concurrency.

🛠️ Installation
Ensure you have Rust (2024 edition or later) and Ollama installed.
Option 1: Install via Cargo (Recommended)
cargo install ollamaMQ
Option 2: From Source
-
Clone the repository:
git clone https://github.com/Chleba/ollamaMQ.git cd ollamaMQ -
Build and install locally:
cargo install --path .
🏃 Usage
Docker Installation
Using Docker Compose (Recommended)
- Ensure Docker and Docker Compose are installed.
- Start your local Ollama instance (defaulting to
localhost:11434). - Run:
docker compose up -d
Using Docker CLI
First build the image from the local Dockerfile:
docker build -t chlebon/ollamamq .
Then run the container:
docker run -d \
--name ollamamq \
-p 11435:11435 \
--restart unless-stopped \
chlebon/ollamamq
Command Line Arguments
ollamaMQ supports several options to configure the proxy:
-p, --port <PORT>: Port to listen on (default:11435)-o, --ollama-urls <URL1,URL2>: Comma-separated list of Ollama server URLs (default:http://localhost:11434)-t, --timeout <SECONDS>: Request timeout in seconds (default:300)--no-tui: Disable the interactive TUI dashboard (useful for Docker/CI)--allow-all-routes: Enable fallback proxy for non-standard endpoints-h, --help: Print help message-V, --version: Print version information
Example:
ollamaMQ --port 8080 --ollama-urls http://10.0.0.1:11434,http://10.0.0.2:11434 --timeout 600
Docker Example:
docker run -d \
--name ollamamq \
-p 8080:8080 \
chlebon/ollamamq --port 8080 --ollama-url http://192.168.1.5:11434 --timeout 600
API Proxying
Point your LLM clients to the ollamaMQ port (11435) and include the X-User-ID header.
Supported Endpoints:
GET /health(Internal health check)GET /(Ollama Status)POST /api/generatePOST /api/chatPOST /api/embedPOST /api/embeddingsGET /api/tagsPOST /api/showPOST /api/createPOST /api/copyDELETE /api/deletePOST /api/pullPOST /api/pushGET/HEAD/POST /api/blobs/{digest}GET /api/psGET /api/versionPOST /v1/chat/completions(OpenAI Compatible)POST /v1/completions(OpenAI Compatible)POST /v1/embeddings(OpenAI Compatible)GET /v1/models(OpenAI Compatible)GET /v1/models/{model}(OpenAI Compatible)
Example (cURL):
curl -X POST http://localhost:11435/api/chat \
-H "X-User-ID: developer-1" \
-d '{
"model": "qwen3.5:35b",
"messages": [{"role": "user", "content": "Explain quantum computing."}],
"stream": true
}'
Dashboard Controls
The interactive TUI dashboard provides a live view of the dispatcher's state:
j/kor Arrows: Navigate the user/blocked list.Taborh/l: Switch between the Users and Blocked panels.p: Toggle VIP status for the selected user (absolute priority).b: Toggle Boost status for the selected user (prioritizes every 5th request).x: Block the selected user.X: Block the selected user's IP address.u: Unblock the selected user or IP (works in both panels).qor Esc: Exit the dashboard and stop the application.?: Toggle detailed help.
Visual Indicators:
★(Magenta): VIP User (absolute priority).⚡(Yellow): Boosted User (every 5th request priority).▶(Cyan): Request is currently being processed/streamed.●(Green): User has requests waiting in the queue.○(Gray): User is idle (no active or queued requests).✖(Red): User or IP is blocked.
Logging
Logs are automatically written to ollamamq.log in the current working directory. This keeps the terminal clear for the TUI dashboard while allowing you to monitor system events and debug backend communication.
🐳 Docker
Docker Compose
The included docker-compose.yml provides a ready-to-use configuration:
services:
ollamamq:
build: .
image: chlebon/ollamamq:latest
container_name: ollamamq
ports:
- "11435:11435"
environment:
- OLLAMA_URL=http://host.docker.internal:11434
- PORT=11435
extra_hosts:
- "host.docker.internal:host-gateway"
restart: unless-stopped
Note for Linux Users: When running in Docker on Linux to access a host-based Ollama:
- Listen on all interfaces: Ollama must be configured to listen on
0.0.0.0. You can do this by settingexport OLLAMA_HOST=0.0.0.0before starting the Ollama service (or editing the systemd unit file). - Firewall: Ensure your firewall (e.g.,
ufw) allows traffic from the Docker bridge (usually172.17.0.1/16) to port11434. - Host Gateway: The
extra_hostssetting indocker-compose.ymlmapshost.docker.internalto your host's IP address.
Dockerfile
The Dockerfile uses a multi-stage build:
- Build stage: Uses
rust:1.85-alpineto compile the release binary - Runtime stage: Uses
alpine:3.20with onlyca-certificatesfor a minimal footprint (~10MB)
Environment Variables
| Variable | Description | Default |
| ------------ | ------------------------------ | ------------------------ |
| OLLAMA_URL | URL of the Ollama server | http://localhost:11434 |
| PORT | Port for ollamaMQ to listen on | 11435 |
| TIMEOUT | Request timeout in seconds | 300 |
Connecting to Different Ollama Servers
Local Ollama (on host machine)
docker run -d \
--name ollamamq \
-p 11435:11435 \
-e OLLAMA_URL=http://host.docker.internal:11434 \
chlebon/ollamamq
Remote Ollama Server
docker run -d \
--name ollamamq \
-p 11435:11435 \
-e OLLAMA_URL=https://ollama.example.com:11434 \
chlebon/ollamamq
Custom Port on Same Server
docker run -d \
--name ollamamq \
-p 8080:8080 \
-e OLLAMA_URL=http://host.docker.internal:11436 \
-e PORT=8080 \
chlebon/ollamamq
Ollama in Docker (different container)
docker run -d \
--name ollamamq \
--network ollama-network \
-p 11435:11435 \
-e OLLAMA_URL=http://ollama:11434 \
chlebon/ollamamq
Port Configuration
- 11435: The proxy port that clients connect to (exposed by default)
- 11434: The Ollama server port (internal, not exposed)
To change the proxy port, use the PORT environment variable:
docker run -d \
--name ollamamq \
-p 8080:8080 \
-e PORT=8080 \
chlebon/ollamamq
🏗️ Architecture
src/main.rs: Entry point, HTTP server initialization, and TUI lifecycle management.src/dispatcher.rs: Core logic for queuing, round-robin scheduling, and Ollama proxying.src/tui.rs: Implementation of the terminal-based monitoring dashboard.
Request Flow
- Client sends a request with
X-User-ID. ollamaMQpushes the request into a user-specific queue.- The background worker checks for available backends (Online & not busy).
- If a backend is free, the worker pops the next task (fair-share rotation) and spawns a parallel task.
- The request is proxied to the selected Ollama backend.
- The response is streamed back to the client in real-time, while the worker can immediately start another task on a different backend.
📦 Publishing to Docker Hub
To publish a new version of ollamaMQ to Docker Hub, follow these steps:
-
Update Version: Update the version number in
Cargo.toml. -
Build and Tag:
# Build the image for the current version docker build -t chlebon/ollamamq:v0.2.4 . # Tag it as latest docker tag chlebon/ollamamq:v0.2.4 chlebon/ollamamq:latest -
Push to Hub:
# Log in to Docker Hub (if not already logged in) docker login # Push the versioned tag docker push chlebon/ollamamq:v0.2.4 # Push the latest tag
