Airunner
Offline inference engine for art, real-time voice conversations, LLM powered chatbots and automated workflows
Install / Use
/learn @Capsize-Games/AirunnerREADME
AI Runner
Support development. Send crypto: 0x02030569e866e22C9991f55Db0445eeAd2d646c8
Your new favorite local AI platform
AI Runner is an all-in-one, offline-first desktop application, headless server, and Python library for local LLMs, TTS, STT, and image generation.
<img src="./images/art_interface.png" alt="AI Runner Logo" />🐞 Report Bug · ✨ Request Feature · 🛡️ Report Vulnerability · 📖 Wiki
✨ Key Features
| Feature | Description | |---------|-------------| | 🗣️ Voice Chat | Real-time conversations with LLMs using espeak or OpenVoice | | 🤖 Custom AI Agents | Configurable personalities, moods, and RAG-enhanced knowledge | | 🎨 Visual Workflows | Drag-and-drop LangGraph workflow builder with runtime execution | | 🖼️ Image Generation | Stable Diffusion (SD 1.5, SDXL) and FLUX models with drawing tools, LoRA, inpainting, and filters | | 🔒 Privacy First | Runs locally with no external APIs by default, configurable guardrails | | ⚡ Fast Generation | Uses GGUF and quantization for faster inference and lower VRAM usage |
🌍 Language Support
| Language | TTS | LLM | STT | GUI | |----------|-----|-----|-----|-----| | English | ✅ | ✅ | ✅ | ✅ | | Japanese | ✅ | ✅ | ❌ | ✅ | | Spanish/French/Chinese/Korean | ✅ | ✅ | ❌ | ❌ |
⚙️ System Requirements
| | Minimum | Recommended | |---|---------|-------------| | OS | Ubuntu 22.04, Windows 10 | Ubuntu 22.04 (Wayland) | | CPU | Ryzen 2700K / i7-8700K | Ryzen 5800X / i7-11700K | | RAM | 16 GB | 32 GB | | GPU | NVIDIA RTX 3060 | NVIDIA RTX 5080 | | Storage | 22 GB - 100 GB+ (actual usage varies, SSD recommended) | 100 GB+ |
💾 Installation
Docker (Recommended)
GUI Mode:
xhost +local:docker && docker compose run --rm airunner
Headless API Server:
docker compose run --rm --service-ports airunner --headless
Note:
--service-portsis required to expose port 8080 for the API.
The headless server exposes an HTTP API on port 8080 with endpoints:
GET /health- Health check and service statusPOST /llm- LLM inferencePOST /art- Image generation
Manual Installation (Ubuntu/Debian)
Python 3.13+ required. We recommend using pyenv and venv.
-
Install system dependencies:
sudo apt update && sudo apt install -y \ build-essential cmake git curl wget \ nvidia-cuda-toolkit pipewire libportaudio2 libxcb-cursor0 \ espeak espeak-ng-espeak qt6-qpa-plugins qt6-wayland \ mecab libmecab-dev mecab-ipadic-utf8 libxslt-dev mkcert -
Create data directory:
mkdir -p ~/.local/share/airunner -
Install AI Runner:
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128 pip install airunner[all_dev] -
Install llama-cpp-python with CUDA (Python 3.13, RTX 5080):
CMAKE_ARGS="-DGGML_CUDA=on -DGGML_CUDA_ARCHITECTURES=90" FORCE_CMAKE=1 \
pip install --no-binary=:all: --no-cache-dir "llama-cpp-python==0.3.16"
- Uses GGML_CUDA (CUBLAS flag is deprecated).
90matches RTX 5080 class GPUs; drop-DGGML_CUDA_ARCHITECTURESif you are unsure and let it auto-detect.- On Python 3.12 you may instead use the prebuilt wheel:
--extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cu121 "llama-cpp-python==0.3.16+cu121".
- Run:
airunner
For detailed instructions, see the Installation Wiki.
🤖 Models
AI Runner downloads essential TTS/STT models automatically. LLM and image models must be configured:
| Category | Model | Size | |----------|-------|------| | LLM (default) | Llama 3.1 8B Instruct (4bit) | ~4 GB | | Image | Stable Diffusion 1.5 | ~2 GB | | Image | SDXL 1.0 | ~6 GB | | Image | FLUX.1 Dev/Schnell (GGUF) | 8-12 GB | | TTS | OpenVoice | 654 MB | | STT | Whisper Tiny | 155 MB |
LLM Providers: Local (HuggingFace), Ollama, OpenRouter, OpenAI
Art Models: Place your models in ~/.local/share/airunner/art/models/
🛠️ CLI Commands
| Command | Description |
|---------|-------------|
| airunner | Launch GUI |
| airunner-headless | Start headless API server |
| airunner-hf-download | Download/manage models from HuggingFace |
| airunner-civitai-download | Download models from CivitAI |
| airunner-build-ui | Rebuild UI from .ui files |
| airunner-tests | Run test suite |
| airunner-generate-cert | Generate SSL certificate |
Note: To download models, use Tools → Download Models from the main application menu, or use airunner-hf-download / airunner-civitai-download from the command line.
🖥️ Headless Server
AI Runner can run as a headless HTTP API server, enabling remote access to LLM, image generation, TTS, and STT capabilities. This is useful for:
- Running AI services on a remote server
- Integration with other applications via REST API
- VS Code integration as an Ollama/OpenAI replacement
- Automated pipelines and scripting
Quick Start
# Start with defaults (port 8080, LLM only)
airunner-headless
# Start with a specific LLM model
airunner-headless --model /path/to/Qwen2.5-7B-Instruct-4bit
# Run as Ollama replacement for VS Code (port 11434)
airunner-headless --ollama-mode
# Don't preload models - load on first request
airunner-headless --no-preload
Command Line Options
| Option | Description |
|--------|-------------|
| --host HOST | Host address to bind to (default: 0.0.0.0) |
| --port PORT | Port to listen on (default: 8080, or 11434 in ollama-mode) |
| --ollama-mode | Run as Ollama replacement on port 11434 |
| --model, -m PATH | Path to LLM model to load |
| --art-model PATH | Path to Stable Diffusion model to load |
| --tts-model PATH | Path to TTS model to load |
| --stt-model PATH | Path to STT model to load |
| --enable-llm | Enable LLM service |
| --enable-art | Enable Stable Diffusion/art service |
| --enable-tts | Enable TTS service |
| --enable-stt | Enable STT service |
| --no-preload | Don't preload models at startup |
Environment Variables
| Variable | Description |
|----------|-------------|
| AIRUNNER_LLM_MODEL_PATH | Path to LLM model |
| AIRUNNER_ART_MODEL_PATH | Path to art model |
| AIRUNNER_TTS_MODEL_PATH | Path to TTS model |
| AIRUNNER_STT_MODEL_PATH | Path to STT model |
| AIRUNNER_NO_PRELOAD | Set to 1 to disable model preloading |
| AIRUNNER_LLM_ON | Enable LLM service (1 or 0) |
| AIRUNNER_SD_ON | Enable Stable Diffusion (1 or 0) |
| AIRUNNER_TTS_ON | Enable TTS service (1 or 0) |
| AIRUNNER_STT_ON | Enable STT service (1 or 0) |
API Endpoints
Native AIRunner Endpoints
| Method | Endpoint | Description |
|--------|----------|-------------|
| GET | /health | Health check and service status |
| POST | /llm | LLM text generation (streaming) |
| POST | /llm/generate | LLM text generation |
| POST | /art | Image generation |
| POST | /tts | Text-to-speech |
| POST | /stt | Speech-to-text |
Ollama-Compatible Endpoints (port 11434)
| Method | Endpoint | Description |
|--------|----------|-------------|
| GET | /api/tags | List available models |
| GET | /api/version | Get version info |
| GET | /api/ps | List running models |
| POST | /api/generate | Text generation |
| POST | /api/chat | Chat completion |
| POST | /api/show | Show model info |
OpenAI-Compatible Endpoints
| Method | Endpoint | Description |
|--------|----------|-------------|
| GET | /v1/models | List models |
| POST | /v1/chat/completions | Chat completion with tool support |
Example: LLM Request
curl -X POST http://localhost:8080/llm \
-H "Content-Type: application/json" \
-d '{
"prompt": "What is the capital of France?",
"stream": true,
"temperature": 0.7,
"max_tokens": 100
}'
Example: Image Generation (Art)
# Requires: airunner-headless --enable-art
curl -X POST http://localhost:8080/art \
-H "Content-Type: application/json" \
-d '{
"prompt": "A beautiful sunset over mountains",
"negative_prompt": "blurry, low quality",
"width": 512,
"height": 512,
"steps": 20,
"seed": 42
}'
# Returns: {"images": ["base64_png_data..."], "count": 1, "seed": 42}
Example: Text-to-Speech (TTS)
# Requires: airunner-headless --enable-tts
curl -X POST http://localhost:8080/tts \
-H "Content-Type: application/json" \
-d '{"text": "Hello, world!"}'
# Returns: {"status": "queued", "message": "Text queued for speech synthesis"}
# Audio plays through system speakers
Example: Speech-to-Text (STT)
# Requires: airunner-headless --enable-stt
# Audio must be base64-encoded WAV (16kHz mono recommended)
curl -X POST http://localhost:8080/stt \
-H "Content-Type: application/json" \
-d '{"audio": "UklGRi4AAABXQVZFZm10IBAAAAABAAEA..."}'
# Returns: {"transcription": "Hello world", "status": "success"}
Example: Ollama Mode with VS Code
-
Start the headless server in Ollama mode:
airunner-headless --ollama-mode --model /path/to/your/model -
Configure VS Code Continue extension to use
http://localhost:11434 -
The server will respond to Ollama API calls, allowing seamless integration.
Auto-Loading Models
When --no-preload is used, models are automatically loaded on the first request to the corresponding endpoint. This is useful for:
- Reducing startup time
- Running multiple services without loading all models u
Related Skills
claude-opus-4-5-migration
82.7kMigrate prompts and code from Claude Sonnet 4.0, Sonnet 4.5, or Opus 4.1 to Opus 4.5
model-usage
335.8kUse CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.
openhue
335.8kControl Philips Hue lights and scenes via the OpenHue CLI.
sag
335.8kElevenLabs text-to-speech with mac-style say UX.
