SkillAgentSearch skills...

Restai

RESTai is an AIaaS (AI as a Service) open-source platform. Supports many public and local LLM suported by Ollama/vLLM/etc. Precise embeddings usage and tuning. Built-in image generation (Dall-E, SD, Flux) and dynamic loading generators. Internal block based graphical language. Prompt versioning and much more...

Install / Use

/learn @apocas/Restai

README

<!-- markdownlint-disable MD033 --> <h1 align="center"> <img src="https://github.com/apocas/restai/blob/master/readme/assets/restai-logo.png" alt="RESTai Logo" width="120"/> <br/>RESTai </h1> <p align="center"> <strong>AIaaS (AI as a Service) — Create AI projects and consume them via a simple REST API.</strong> </p> <p align="center"> <a href="https://github.com/apocas/restai/actions/workflows/tests.yml"><img src="https://github.com/apocas/restai/actions/workflows/tests.yml/badge.svg" alt="Tests"/></a> <img src="https://img.shields.io/badge/python-3.11+-blue.svg" alt="Python 3.11+"/> <a href="https://github.com/apocas/restai/blob/master/LICENSE"><img src="https://img.shields.io/badge/license-Apache%202.0-green.svg" alt="License"/></a> <img src="https://img.shields.io/badge/docker-ready-2496ED?logo=docker&logoColor=white" alt="Docker"/> <img src="https://img.shields.io/badge/kubernetes-ready-326CE5?logo=kubernetes&logoColor=white" alt="Kubernetes"/> </p> <div align="center"> <img src="https://github.com/apocas/restai/blob/master/readme/assets/templates.png" width="800" alt="RESTai Dashboard"/> </div>

Quick Start

git clone https://github.com/apocas/restai && cd restai
make install
make dev  # → http://localhost:9000/admin (admin / admin)

Or with Docker:

docker compose --env-file .env up --build

Why RESTai?

  • Multi-project AI platform — RAG (with optional SQL-to-NL), Agents, Block (visual logic), and Inference in one place
  • Full Web UI included — React dashboard with analytics, not just an API
  • Any LLM — OpenAI, Anthropic, Ollama, Gemini, Groq, LiteLLM, vLLM, Azure, and more
  • Feature complete — Teams, RBAC, OAuth/LDAP, token tracking, per-project rate limiting, Kubernetes-native
  • Extensible tools — MCP (Model Context Protocol) for unlimited agent integrations
  • Token tracking, cost & latency analytics — built-in dashboard with daily usage, per-project costs, latency monitoring, and top LLM charts

Features

Dashboard & Analytics

Track token usage, costs, latency, and project activity from a centralized dashboard. Daily charts for tokens, costs, and response latency per project — identify performance regressions at a glance.

<div align="center"> <img src="https://github.com/apocas/restai/blob/master/readme/assets/home.png" width="750" alt="RESTai Dashboard"/> </div>

Projects & Chat Playground

Create and manage AI projects. Each project has its own LLM, system prompt, tools, and configuration. Test instantly in the built-in chat playground.

<div align="center"> <img src="https://github.com/apocas/restai/blob/master/readme/assets/projects.png" width="750" alt="RESTai Projects"/> </div> <div align="center"> <img src="https://github.com/apocas/restai/blob/master/readme/assets/playground.png" width="750" alt="RESTai Playground"/> </div>

RAG (Retrieval-Augmented Generation)

Upload documents and query them with LLM-powered retrieval. Supports multiple vector stores, reranking (ColBERT / LLM-based), sandboxed mode to reduce hallucination, and evaluation via deepeval. Optionally connect a MySQL or PostgreSQL database to translate natural language questions into SQL queries automatically.

<div align="center"> <img src="https://github.com/apocas/restai/blob/master/readme/assets/rag.png" width="750" alt="RESTai RAG"/> </div>

Agents + MCP

Zero-shot ReAct agents with built-in tools and MCP (Model Context Protocol) server support for extensible tool access. Connect any MCP-compatible server via HTTP/SSE or stdio.

<div align="center"> <img src="https://github.com/apocas/restai/blob/master/readme/assets/agent1.png" width="40%" alt="RESTai Agent"/> <img src="https://github.com/apocas/restai/blob/master/readme/assets/agent2.png" width="40%" alt="RESTai Agent Tools"/> </div>

Inference (Multimodal)

Direct LLM chat and completion. Supports sending images alongside text using any vision-capable model (LLaVA, Gemini, GPT-4o, etc.).

<div align="center"> <img src="https://github.com/apocas/restai/blob/master/readme/assets/inference.png" width="750" alt="RESTai Inference"/> </div>

Block (Visual Logic Builder)

Build processing logic visually using a Blockly-based IDE — no LLM required. Drag-and-drop blocks to define how input is transformed into output. Use the "Call Project" block to invoke other RESTai projects, enabling composition of AI pipelines without writing code.

Supported blocks: text operations, math, logic, variables, loops, and custom RESTai blocks (Get Input, Set Output, Call Project, Log).

<div align="center"> <img src="https://github.com/apocas/restai/blob/master/readme/assets/block.png" width="750" alt="RESTai Block IDE"/> </div>

MCP Server

RESTai includes an optional built-in MCP (Model Context Protocol) server that exposes your projects as tools consumable by any MCP client — Claude Desktop, Cursor, or custom agents. Each user authenticates with a Bearer API key and can only access their assigned projects.

Enable via MCP_SERVER=true environment variable or the admin settings page (requires restart). Clients connect to http://your-host:9000/mcp/sse.

Available tools:

  • list_projects — Discover which AI projects you have access to
  • query_project — Send a question (with optional image) to any accessible project

Evaluation Framework

Built-in evaluation system to measure and track AI project quality over time. Create test datasets with question/expected-answer pairs, run evaluations with multiple metrics, and visualize score trends.

Metrics (powered by DeepEval):

  • Answer Relevancy — Is the answer relevant to the question?
  • Faithfulness — Is the answer grounded in the retrieved context? (RAG projects)
  • Correctness — Does the answer match the expected output?
<div align="center"> <img src="https://github.com/apocas/restai/blob/master/readme/assets/eval.png" width="750" alt="RESTai Evaluation"/> </div>

Prompt Versioning

Every system prompt change is automatically versioned. Browse the full history, compare versions, and restore any previous prompt with one click. Eval runs are linked to prompt versions, enabling A/B comparison — see exactly how a prompt change affected quality scores.

<div align="center"> <img src="https://github.com/apocas/restai/blob/master/readme/assets/prompts.png" width="750" alt="RESTai Prompt Versioning"/> </div>

Image Generation

Local and remote image generators loaded dynamically. Supports Stable Diffusion, Flux, DALL-E, RMBG2, and more.

<div align="center"> <img src="https://github.com/apocas/restai/blob/master/readme/assets/flux1.png" width="45%" alt="Flux1"/> <img src="https://github.com/apocas/restai/blob/master/readme/assets/vision_sd.png" width="22%" alt="Stable Diffusion"/> <img src="https://github.com/apocas/restai/blob/master/readme/assets/rmbg2.png" width="22%" alt="RMBG2"/> </div>

GPU Auto-Detection & Management

RESTai automatically detects NVIDIA GPUs at startup and displays detailed hardware information in the admin settings — model name, VRAM, temperature, utilization, power draw, driver and CUDA versions. GPU support is auto-enabled when hardware is detected, or can be toggled manually.

make install also detects GPUs automatically and installs GPU dependencies when available.

<div align="center"> <img src="https://github.com/apocas/restai/blob/master/readme/assets/gpus.png" width="750" alt="RESTai GPU Detection"/> </div>

Direct Access (OpenAI-Compatible)

Use LLMs, image generators, and audio transcription directly via OpenAI-compatible API endpoints — no project required. Team-level permissions control which models each user can access, and all usage counts toward team budgets.

Supported endpoints:

  • POST /v1/chat/completions — Chat with any LLM (streaming supported)
  • POST /v1/images/generations — Generate images via DALL-E, Flux, Stable Diffusion, etc.
  • POST /v1/audio/transcriptions — Transcribe audio files

Works with any OpenAI-compatible SDK:

from openai import OpenAI

client = OpenAI(base_url="http://localhost:9000/v1", api_key="YOUR_API_KEY")
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}],
)
<div align="center"> <img src="https://github.com/apocas/restai/blob/master/readme/assets/directaccess.png" width="750" alt="RESTai Direct Access"/> </div>

Teams & Multi-tenancy

Each team has its own users, admins, projects, and LLM/embedding access controls — including image and audio generator permissions. Users can belong to multiple teams.

<div align="center"> <img src="https://github.com/apocas/restai/blob/master/readme/assets/teams.png" width="750" alt="RESTai Teams"/> </div>

Guardrails

Protect your AI projects with input and output guards. Guards are regular RestAI projects — define safety rules via system prompts, and they'll evaluate every request and response automatically.

  • Input Guard — Checks user questions before inference
  • Output Guard — Checks LLM responses after inference
  • Block or Warn mode — Hard-block unsafe content or flag it while passing through
  • Analytics dashboard — Track block rates, view blocked requests, and monitor guard effectiveness over time

Response Cache

Enable per-project response caching to speed up repeated or similar questions. Uses ChromaDB vector similarity to match incoming questions against cached answers — if a question is similar enough (above the configurable threshold), the cached answer is returned instantly without calling the LLM.

Works across all project types. Cache is automatically invalidated when the knowledge base changes (document ingestion or deletion). Configurable similarity threshold (de

Related Skills

View on GitHub
GitHub Stars481
CategoryCustomer
Updated1m ago
Forks94

Languages

Python

Security Score

100/100

Audited on Mar 31, 2026

No findings