Restai
RESTai is an AIaaS (AI as a Service) open-source platform. Supports many public and local LLM suported by Ollama/vLLM/etc. Precise embeddings usage and tuning. Built-in image generation (Dall-E, SD, Flux) and dynamic loading generators. Internal block based graphical language. Prompt versioning and much more...
Install / Use
/learn @apocas/RestaiREADME
Quick Start
git clone https://github.com/apocas/restai && cd restai
make install
make dev # → http://localhost:9000/admin (admin / admin)
Or with Docker:
docker compose --env-file .env up --build
Why RESTai?
- Multi-project AI platform — RAG (with optional SQL-to-NL), Agents, Block (visual logic), and Inference in one place
- Full Web UI included — React dashboard with analytics, not just an API
- Any LLM — OpenAI, Anthropic, Ollama, Gemini, Groq, LiteLLM, vLLM, Azure, and more
- Feature complete — Teams, RBAC, OAuth/LDAP, token tracking, per-project rate limiting, Kubernetes-native
- Extensible tools — MCP (Model Context Protocol) for unlimited agent integrations
- Token tracking, cost & latency analytics — built-in dashboard with daily usage, per-project costs, latency monitoring, and top LLM charts
Features
Dashboard & Analytics
Track token usage, costs, latency, and project activity from a centralized dashboard. Daily charts for tokens, costs, and response latency per project — identify performance regressions at a glance.
<div align="center"> <img src="https://github.com/apocas/restai/blob/master/readme/assets/home.png" width="750" alt="RESTai Dashboard"/> </div>Projects & Chat Playground
Create and manage AI projects. Each project has its own LLM, system prompt, tools, and configuration. Test instantly in the built-in chat playground.
<div align="center"> <img src="https://github.com/apocas/restai/blob/master/readme/assets/projects.png" width="750" alt="RESTai Projects"/> </div> <div align="center"> <img src="https://github.com/apocas/restai/blob/master/readme/assets/playground.png" width="750" alt="RESTai Playground"/> </div>RAG (Retrieval-Augmented Generation)
Upload documents and query them with LLM-powered retrieval. Supports multiple vector stores, reranking (ColBERT / LLM-based), sandboxed mode to reduce hallucination, and evaluation via deepeval. Optionally connect a MySQL or PostgreSQL database to translate natural language questions into SQL queries automatically.
<div align="center"> <img src="https://github.com/apocas/restai/blob/master/readme/assets/rag.png" width="750" alt="RESTai RAG"/> </div>Agents + MCP
Zero-shot ReAct agents with built-in tools and MCP (Model Context Protocol) server support for extensible tool access. Connect any MCP-compatible server via HTTP/SSE or stdio.
<div align="center"> <img src="https://github.com/apocas/restai/blob/master/readme/assets/agent1.png" width="40%" alt="RESTai Agent"/> <img src="https://github.com/apocas/restai/blob/master/readme/assets/agent2.png" width="40%" alt="RESTai Agent Tools"/> </div>Inference (Multimodal)
Direct LLM chat and completion. Supports sending images alongside text using any vision-capable model (LLaVA, Gemini, GPT-4o, etc.).
<div align="center"> <img src="https://github.com/apocas/restai/blob/master/readme/assets/inference.png" width="750" alt="RESTai Inference"/> </div>Block (Visual Logic Builder)
Build processing logic visually using a Blockly-based IDE — no LLM required. Drag-and-drop blocks to define how input is transformed into output. Use the "Call Project" block to invoke other RESTai projects, enabling composition of AI pipelines without writing code.
Supported blocks: text operations, math, logic, variables, loops, and custom RESTai blocks (Get Input, Set Output, Call Project, Log).
<div align="center"> <img src="https://github.com/apocas/restai/blob/master/readme/assets/block.png" width="750" alt="RESTai Block IDE"/> </div>MCP Server
RESTai includes an optional built-in MCP (Model Context Protocol) server that exposes your projects as tools consumable by any MCP client — Claude Desktop, Cursor, or custom agents. Each user authenticates with a Bearer API key and can only access their assigned projects.
Enable via MCP_SERVER=true environment variable or the admin settings page (requires restart). Clients connect to http://your-host:9000/mcp/sse.
Available tools:
list_projects— Discover which AI projects you have access toquery_project— Send a question (with optional image) to any accessible project
Evaluation Framework
Built-in evaluation system to measure and track AI project quality over time. Create test datasets with question/expected-answer pairs, run evaluations with multiple metrics, and visualize score trends.
Metrics (powered by DeepEval):
- Answer Relevancy — Is the answer relevant to the question?
- Faithfulness — Is the answer grounded in the retrieved context? (RAG projects)
- Correctness — Does the answer match the expected output?
Prompt Versioning
Every system prompt change is automatically versioned. Browse the full history, compare versions, and restore any previous prompt with one click. Eval runs are linked to prompt versions, enabling A/B comparison — see exactly how a prompt change affected quality scores.
<div align="center"> <img src="https://github.com/apocas/restai/blob/master/readme/assets/prompts.png" width="750" alt="RESTai Prompt Versioning"/> </div>Image Generation
Local and remote image generators loaded dynamically. Supports Stable Diffusion, Flux, DALL-E, RMBG2, and more.
<div align="center"> <img src="https://github.com/apocas/restai/blob/master/readme/assets/flux1.png" width="45%" alt="Flux1"/> <img src="https://github.com/apocas/restai/blob/master/readme/assets/vision_sd.png" width="22%" alt="Stable Diffusion"/> <img src="https://github.com/apocas/restai/blob/master/readme/assets/rmbg2.png" width="22%" alt="RMBG2"/> </div>GPU Auto-Detection & Management
RESTai automatically detects NVIDIA GPUs at startup and displays detailed hardware information in the admin settings — model name, VRAM, temperature, utilization, power draw, driver and CUDA versions. GPU support is auto-enabled when hardware is detected, or can be toggled manually.
make install also detects GPUs automatically and installs GPU dependencies when available.
Direct Access (OpenAI-Compatible)
Use LLMs, image generators, and audio transcription directly via OpenAI-compatible API endpoints — no project required. Team-level permissions control which models each user can access, and all usage counts toward team budgets.
Supported endpoints:
POST /v1/chat/completions— Chat with any LLM (streaming supported)POST /v1/images/generations— Generate images via DALL-E, Flux, Stable Diffusion, etc.POST /v1/audio/transcriptions— Transcribe audio files
Works with any OpenAI-compatible SDK:
from openai import OpenAI
client = OpenAI(base_url="http://localhost:9000/v1", api_key="YOUR_API_KEY")
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello!"}],
)
<div align="center">
<img src="https://github.com/apocas/restai/blob/master/readme/assets/directaccess.png" width="750" alt="RESTai Direct Access"/>
</div>
Teams & Multi-tenancy
Each team has its own users, admins, projects, and LLM/embedding access controls — including image and audio generator permissions. Users can belong to multiple teams.
<div align="center"> <img src="https://github.com/apocas/restai/blob/master/readme/assets/teams.png" width="750" alt="RESTai Teams"/> </div>Guardrails
Protect your AI projects with input and output guards. Guards are regular RestAI projects — define safety rules via system prompts, and they'll evaluate every request and response automatically.
- Input Guard — Checks user questions before inference
- Output Guard — Checks LLM responses after inference
- Block or Warn mode — Hard-block unsafe content or flag it while passing through
- Analytics dashboard — Track block rates, view blocked requests, and monitor guard effectiveness over time
Response Cache
Enable per-project response caching to speed up repeated or similar questions. Uses ChromaDB vector similarity to match incoming questions against cached answers — if a question is similar enough (above the configurable threshold), the cached answer is returned instantly without calling the LLM.
Works across all project types. Cache is automatically invalidated when the knowledge base changes (document ingestion or deletion). Configurable similarity threshold (de
Related Skills
claude-opus-4-5-migration
84.6kMigrate prompts and code from Claude Sonnet 4.0, Sonnet 4.5, or Opus 4.1 to Opus 4.5
model-usage
341.8kUse CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.
openhue
341.8kControl Philips Hue lights and scenes via the OpenHue CLI.
sag
341.8kElevenLabs text-to-speech with mac-style say UX.
