SkillAgentSearch skills...

Airunner

Offline inference engine for art, real-time voice conversations, LLM powered chatbots and automated workflows

Install / Use

/learn @Capsize-Games/Airunner

README

AI Runner

Support development. Send crypto: 0x02030569e866e22C9991f55Db0445eeAd2d646c8

Your new favorite local AI platform

AI Runner is an all-in-one, offline-first desktop application, headless server, and Python library for local LLMs, TTS, STT, and image generation.

<img src="./images/art_interface.png" alt="AI Runner Logo" />

🐞 Report Bug · ✨ Request Feature · 🛡️ Report Vulnerability · 📖 Wiki


✨ Key Features

| Feature | Description | |---------|-------------| | 🗣️ Voice Chat | Real-time conversations with LLMs using espeak or OpenVoice | | 🤖 Custom AI Agents | Configurable personalities, moods, and RAG-enhanced knowledge | | 🎨 Visual Workflows | Drag-and-drop LangGraph workflow builder with runtime execution | | 🖼️ Image Generation | Stable Diffusion (SD 1.5, SDXL) and FLUX models with drawing tools, LoRA, inpainting, and filters | | 🔒 Privacy First | Runs locally with no external APIs by default, configurable guardrails | | ⚡ Fast Generation | Uses GGUF and quantization for faster inference and lower VRAM usage |

🌍 Language Support

| Language | TTS | LLM | STT | GUI | |----------|-----|-----|-----|-----| | English | ✅ | ✅ | ✅ | ✅ | | Japanese | ✅ | ✅ | ❌ | ✅ | | Spanish/French/Chinese/Korean | ✅ | ✅ | ❌ | ❌ |


⚙️ System Requirements

| | Minimum | Recommended | |---|---------|-------------| | OS | Ubuntu 22.04, Windows 10 | Ubuntu 22.04 (Wayland) | | CPU | Ryzen 2700K / i7-8700K | Ryzen 5800X / i7-11700K | | RAM | 16 GB | 32 GB | | GPU | NVIDIA RTX 3060 | NVIDIA RTX 5080 | | Storage | 22 GB - 100 GB+ (actual usage varies, SSD recommended) | 100 GB+ |


💾 Installation

Docker (Recommended)

GUI Mode:

xhost +local:docker && docker compose run --rm airunner

Headless API Server:

docker compose run --rm --service-ports airunner --headless

Note: --service-ports is required to expose port 8080 for the API.

The headless server exposes an HTTP API on port 8080 with endpoints:

  • GET /health - Health check and service status
  • POST /llm - LLM inference
  • POST /art - Image generation

Manual Installation (Ubuntu/Debian)

Python 3.13+ required. We recommend using pyenv and venv.

  1. Install system dependencies:

    sudo apt update && sudo apt install -y \
      build-essential cmake git curl wget \
      nvidia-cuda-toolkit pipewire libportaudio2 libxcb-cursor0 \
      espeak espeak-ng-espeak qt6-qpa-plugins qt6-wayland \
      mecab libmecab-dev mecab-ipadic-utf8 libxslt-dev mkcert
    
  2. Create data directory:

    mkdir -p ~/.local/share/airunner
    
  3. Install AI Runner:

    pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128
    pip install airunner[all_dev]
    
  4. Install llama-cpp-python with CUDA (Python 3.13, RTX 5080):

CMAKE_ARGS="-DGGML_CUDA=on -DGGML_CUDA_ARCHITECTURES=90" FORCE_CMAKE=1 \
  pip install --no-binary=:all: --no-cache-dir "llama-cpp-python==0.3.16"
  • Uses GGML_CUDA (CUBLAS flag is deprecated).
  • 90 matches RTX 5080 class GPUs; drop -DGGML_CUDA_ARCHITECTURES if you are unsure and let it auto-detect.
  • On Python 3.12 you may instead use the prebuilt wheel: --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cu121 "llama-cpp-python==0.3.16+cu121".
  1. Run:
    airunner
    

For detailed instructions, see the Installation Wiki.


🤖 Models

AI Runner downloads essential TTS/STT models automatically. LLM and image models must be configured:

| Category | Model | Size | |----------|-------|------| | LLM (default) | Llama 3.1 8B Instruct (4bit) | ~4 GB | | Image | Stable Diffusion 1.5 | ~2 GB | | Image | SDXL 1.0 | ~6 GB | | Image | FLUX.1 Dev/Schnell (GGUF) | 8-12 GB | | TTS | OpenVoice | 654 MB | | STT | Whisper Tiny | 155 MB |

LLM Providers: Local (HuggingFace), Ollama, OpenRouter, OpenAI

Art Models: Place your models in ~/.local/share/airunner/art/models/


🛠️ CLI Commands

| Command | Description | |---------|-------------| | airunner | Launch GUI | | airunner-headless | Start headless API server | | airunner-hf-download | Download/manage models from HuggingFace | | airunner-civitai-download | Download models from CivitAI | | airunner-build-ui | Rebuild UI from .ui files | | airunner-tests | Run test suite | | airunner-generate-cert | Generate SSL certificate |

Note: To download models, use Tools → Download Models from the main application menu, or use airunner-hf-download / airunner-civitai-download from the command line.


🖥️ Headless Server

AI Runner can run as a headless HTTP API server, enabling remote access to LLM, image generation, TTS, and STT capabilities. This is useful for:

  • Running AI services on a remote server
  • Integration with other applications via REST API
  • VS Code integration as an Ollama/OpenAI replacement
  • Automated pipelines and scripting

Quick Start

# Start with defaults (port 8080, LLM only)
airunner-headless

# Start with a specific LLM model
airunner-headless --model /path/to/Qwen2.5-7B-Instruct-4bit

# Run as Ollama replacement for VS Code (port 11434)
airunner-headless --ollama-mode

# Don't preload models - load on first request
airunner-headless --no-preload

Command Line Options

| Option | Description | |--------|-------------| | --host HOST | Host address to bind to (default: 0.0.0.0) | | --port PORT | Port to listen on (default: 8080, or 11434 in ollama-mode) | | --ollama-mode | Run as Ollama replacement on port 11434 | | --model, -m PATH | Path to LLM model to load | | --art-model PATH | Path to Stable Diffusion model to load | | --tts-model PATH | Path to TTS model to load | | --stt-model PATH | Path to STT model to load | | --enable-llm | Enable LLM service | | --enable-art | Enable Stable Diffusion/art service | | --enable-tts | Enable TTS service | | --enable-stt | Enable STT service | | --no-preload | Don't preload models at startup |

Environment Variables

| Variable | Description | |----------|-------------| | AIRUNNER_LLM_MODEL_PATH | Path to LLM model | | AIRUNNER_ART_MODEL_PATH | Path to art model | | AIRUNNER_TTS_MODEL_PATH | Path to TTS model | | AIRUNNER_STT_MODEL_PATH | Path to STT model | | AIRUNNER_NO_PRELOAD | Set to 1 to disable model preloading | | AIRUNNER_LLM_ON | Enable LLM service (1 or 0) | | AIRUNNER_SD_ON | Enable Stable Diffusion (1 or 0) | | AIRUNNER_TTS_ON | Enable TTS service (1 or 0) | | AIRUNNER_STT_ON | Enable STT service (1 or 0) |

API Endpoints

Native AIRunner Endpoints

| Method | Endpoint | Description | |--------|----------|-------------| | GET | /health | Health check and service status | | POST | /llm | LLM text generation (streaming) | | POST | /llm/generate | LLM text generation | | POST | /art | Image generation | | POST | /tts | Text-to-speech | | POST | /stt | Speech-to-text |

Ollama-Compatible Endpoints (port 11434)

| Method | Endpoint | Description | |--------|----------|-------------| | GET | /api/tags | List available models | | GET | /api/version | Get version info | | GET | /api/ps | List running models | | POST | /api/generate | Text generation | | POST | /api/chat | Chat completion | | POST | /api/show | Show model info |

OpenAI-Compatible Endpoints

| Method | Endpoint | Description | |--------|----------|-------------| | GET | /v1/models | List models | | POST | /v1/chat/completions | Chat completion with tool support |

Example: LLM Request

curl -X POST http://localhost:8080/llm \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "What is the capital of France?",
    "stream": true,
    "temperature": 0.7,
    "max_tokens": 100
  }'

Example: Image Generation (Art)

# Requires: airunner-headless --enable-art
curl -X POST http://localhost:8080/art \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "A beautiful sunset over mountains",
    "negative_prompt": "blurry, low quality",
    "width": 512,
    "height": 512,
    "steps": 20,
    "seed": 42
  }'
# Returns: {"images": ["base64_png_data..."], "count": 1, "seed": 42}

Example: Text-to-Speech (TTS)

# Requires: airunner-headless --enable-tts
curl -X POST http://localhost:8080/tts \
  -H "Content-Type: application/json" \
  -d '{"text": "Hello, world!"}'
# Returns: {"status": "queued", "message": "Text queued for speech synthesis"}
# Audio plays through system speakers

Example: Speech-to-Text (STT)

# Requires: airunner-headless --enable-stt
# Audio must be base64-encoded WAV (16kHz mono recommended)
curl -X POST http://localhost:8080/stt \
  -H "Content-Type: application/json" \
  -d '{"audio": "UklGRi4AAABXQVZFZm10IBAAAAABAAEA..."}'
# Returns: {"transcription": "Hello world", "status": "success"}

Example: Ollama Mode with VS Code

  1. Start the headless server in Ollama mode:

    airunner-headless --ollama-mode --model /path/to/your/model
    
  2. Configure VS Code Continue extension to use http://localhost:11434

  3. The server will respond to Ollama API calls, allowing seamless integration.

Auto-Loading Models

When --no-preload is used, models are automatically loaded on the first request to the corresponding endpoint. This is useful for:

  • Reducing startup time
  • Running multiple services without loading all models u

Related Skills

View on GitHub
GitHub Stars1.3k
CategoryCustomer
Updated2d ago
Forks98

Languages

Python

Security Score

100/100

Audited on Mar 23, 2026

No findings