Restai

RESTai is an AIaaS (AI as a Service) open-source platform. Supports many public and local LLM suported by Ollama/vLLM/etc. Precise embeddings usage and tuning. Built-in image generation (Dall-E, SD, Flux) and dynamic loading generators. Internal block based graphical language. Prompt versioning and much more...

Generate Convert Improve

Install / Use

/learn @apocas/Restai

About this skill

Quality Score

0/100

README

<h1 align="center"> <img src="https://github.com/apocas/restai/blob/master/readme/assets/restai-logo.png" alt="RESTai Logo" width="120"/> <br/>RESTai </h1> <p align="center"> <strong>AIaaS (AI as a Service) — Create AI projects and consume them via a simple REST API.</strong> </p> <p align="center"> <a href="https://github.com/apocas/restai/actions/workflows/tests.yml"><img src="https://github.com/apocas/restai/actions/workflows/tests.yml/badge.svg" alt="Tests"/></a> <img src="https://img.shields.io/badge/python-3.11+-blue.svg" alt="Python 3.11+"/> <a href="https://github.com/apocas/restai/blob/master/LICENSE"><img src="https://img.shields.io/badge/license-Apache%202.0-green.svg" alt="License"/></a> <img src="https://img.shields.io/badge/docker-ready-2496ED?logo=docker&logoColor=white" alt="Docker"/> <img src="https://img.shields.io/badge/kubernetes-ready-326CE5?logo=kubernetes&logoColor=white" alt="Kubernetes"/> </p> <div align="center"> <img src="https://github.com/apocas/restai/blob/master/readme/assets/templates.png" width="800" alt="RESTai Dashboard"/> </div>

Quick Start

git clone https://github.com/apocas/restai && cd restai
make install
make dev  # → http://localhost:9000/admin (admin / admin)

Or with Docker:

docker compose --env-file .env up --build

Why RESTai?

Multi-project AI platform — RAG (with optional SQL-to-NL), Agents, Block (visual logic), and Inference in one place
Full Web UI included — React dashboard with analytics, not just an API
Any LLM — OpenAI, Anthropic, Ollama, Gemini, Groq, LiteLLM, vLLM, Azure, and more
Feature complete — Teams, RBAC, OAuth/LDAP, token tracking, per-project rate limiting, Kubernetes-native
Extensible tools — MCP (Model Context Protocol) for unlimited agent integrations
Token tracking, cost & latency analytics — built-in dashboard with daily usage, per-project costs, latency monitoring, and top LLM charts

Features

Dashboard & Analytics

Track token usage, costs, latency, and project activity from a centralized dashboard. Daily charts for tokens, costs, and response latency per project — identify performance regressions at a glance.

Projects & Chat Playground

Create and manage AI projects. Each project has its own LLM, system prompt, tools, and configuration. Test instantly in the built-in chat playground.

RAG (Retrieval-Augmented Generation)

Upload documents and query them with LLM-powered retrieval. Supports multiple vector stores, reranking (ColBERT / LLM-based), sandboxed mode to reduce hallucination, and evaluation via deepeval. Optionally connect a MySQL or PostgreSQL database to translate natural language questions into SQL queries automatically.

Agents + MCP

Zero-shot ReAct agents with built-in tools and MCP (Model Context Protocol) server support for extensible tool access. Connect any MCP-compatible server via HTTP/SSE or stdio.

Inference (Multimodal)

Direct LLM chat and completion. Supports sending images alongside text using any vision-capable model (LLaVA, Gemini, GPT-4o, etc.).

Block (Visual Logic Builder)

Build processing logic visually using a Blockly-based IDE — no LLM required. Drag-and-drop blocks to define how input is transformed into output. Use the "Call Project" block to invoke other RESTai projects, enabling composition of AI pipelines without writing code.

Supported blocks: text operations, math, logic, variables, loops, and custom RESTai blocks (Get Input, Set Output, Call Project, Log).

MCP Server

RESTai includes an optional built-in MCP (Model Context Protocol) server that exposes your projects as tools consumable by any MCP client — Claude Desktop, Cursor, or custom agents. Each user authenticates with a Bearer API key and can only access their assigned projects.

Enable via MCP_SERVER=true environment variable or the admin settings page (requires restart). Clients connect to http://your-host:9000/mcp/sse.

Available tools:

list_projects — Discover which AI projects you have access to
query_project — Send a question (with optional image) to any accessible project

Evaluation Framework

Built-in evaluation system to measure and track AI project quality over time. Create test datasets with question/expected-answer pairs, run evaluations with multiple metrics, and visualize score trends.

Metrics (powered by DeepEval):

Answer Relevancy — Is the answer relevant to the question?
Faithfulness — Is the answer grounded in the retrieved context? (RAG projects)
Correctness — Does the answer match the expected output?

Prompt Versioning

Every system prompt change is automatically versioned. Browse the full history, compare versions, and restore any previous prompt with one click. Eval runs are linked to prompt versions, enabling A/B comparison — see exactly how a prompt change affected quality scores.

Image Generation

Local and remote image generators loaded dynamically. Supports Stable Diffusion, Flux, DALL-E, RMBG2, and more.

GPU Auto-Detection & Management

RESTai automatically detects NVIDIA GPUs at startup and displays detailed hardware information in the admin settings — model name, VRAM, temperature, utilization, power draw, driver and CUDA versions. GPU support is auto-enabled when hardware is detected, or can be toggled manually.

make install also detects GPUs automatically and installs GPU dependencies when available.

Direct Access (OpenAI-Compatible)

Use LLMs, image generators, and audio transcription directly via OpenAI-compatible API endpoints — no project required. Team-level permissions control which models each user can access, and all usage counts toward team budgets.

Supported endpoints:

POST /v1/chat/completions — Chat with any LLM (streaming supported)
POST /v1/images/generations — Generate images via DALL-E, Flux, Stable Diffusion, etc.
POST /v1/audio/transcriptions — Transcribe audio files

Works with any OpenAI-compatible SDK:

from openai import OpenAI

client = OpenAI(base_url="http://localhost:9000/v1", api_key="YOUR_API_KEY")
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}],
)

Teams & Multi-tenancy

Each team has its own users, admins, projects, and LLM/embedding access controls — including image and audio generator permissions. Users can belong to multiple teams.

Guardrails

Protect your AI projects with input and output guards. Guards are regular RestAI projects — define safety rules via system prompts, and they'll evaluate every request and response automatically.

Input Guard — Checks user questions before inference
Output Guard — Checks LLM responses after inference
Block or Warn mode — Hard-block unsafe content or flag it while passing through
Analytics dashboard — Track block rates, view blocked requests, and monitor guard effectiveness over time

Response Cache

Enable per-project response caching to speed up repeated or similar questions. Uses ChromaDB vector similarity to match incoming questions against cached answers — if a question is similar enough (above the configurable threshold), the cached answer is returned instantly without calling the LLM.

Works across all project types. Cache is automatically invalidated when the knowledge base changes (document ingestion or deletion). Configurable similarity threshold (de

Related Skills

claude-opus-4-5-migration

84.6k

Migrate prompts and code from Claude Sonnet 4.0, Sonnet 4.5, or Opus 4.1 to Opus 4.5

model-usage

341.8k

Use CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.

openhue

341.8k

Control Philips Hue lights and scenes via the OpenHue CLI.

sag

341.8k

ElevenLabs text-to-speech with mac-style say UX.