Observal
Observal is an observability platform and local registry for MCPs, hooks, skills, graphRAGs and more!
Install / Use
/learn @BlazeUp-AI/ObservalREADME
Eval & observability for agentic coding — trace every tool call, score every session, improve every workflow.
<p> <a href="LICENSE"><img src="https://img.shields.io/badge/license-Apache%202.0-blue?style=flat-square" alt="License"></a> <img src="https://img.shields.io/badge/python-3.11+-3776ab?style=flat-square&logo=python&logoColor=white" alt="Python"> <img src="https://img.shields.io/badge/status-alpha-orange?style=flat-square" alt="Status"> </p>Observal is a self-hosted platform that traces every tool call, skill activation, hook execution, sandbox run, and RAG query across your team's AI-assisted coding sessions, then tells you exactly what's helping and what isn't.
It works with Cursor, Kiro, Claude Code, Gemini CLI, VS Code, Windsurf, Codex CLI, and GitHub Copilot.
Quick Start
git clone https://github.com/BlazeUp-AI/Observal.git
cd Observal
cp .env.example .env # edit with your values
cd docker && docker compose up --build -d && cd ..
uv tool install --editable .
observal init # create admin account
Already have MCP servers in your IDE? Instrument them in one command:
observal scan # auto-detect, register, and instrument everything
This detects MCP servers from your IDE config files, registers them with Observal, and wraps them with observal-shim for telemetry — without breaking your existing setup. A timestamped backup is created automatically.
The Problem
Engineering teams using Cursor, Kiro, Claude Code, Gemini CLI, and similar agentic IDEs have no visibility into what actually happens during AI-assisted development. Agents call tools, activate skills, execute code in sandboxes, query knowledge graphs, and fire lifecycle hooks, but none of this is measured. Teams can't answer basic questions:
- Which tools speed up development and which ones waste time?
- Are prompts producing good results or causing rework?
- Do skills actually improve code quality when they activate?
- Which hooks are blocking legitimate actions vs catching real issues?
- Is the RAG system returning relevant context or noise?
- How do two versions of an agent compare on real developer workflows?
Without answers, teams can't improve their tooling. They guess, ship changes, and hope for the better.
How It Works
Observal sits between your IDE and your tools. A transparent shim (observal-shim for stdio, observal-proxy for HTTP) intercepts traffic without modifying it, pairs requests with responses into spans, and streams them to ClickHouse. The shim is injected automatically when you install a tool through Observal - no code changes required. You can also run observal scan to automatically detect and instrument your existing IDE setup; no manual registration required.
IDE <--> observal-shim <--> MCP Server / Tool / Sandbox / GraphRAG
|
v (fire-and-forget)
Observal API --> ClickHouse (traces, spans, scores)
|
v
Eval Engine (LLM-as-judge) --> Scorecards
The eval engine runs on traces after the fact. It scores agent sessions across dimensions like tool selection quality, prompt effectiveness, RAG relevance, and code correctness. Scorecards let you compare versions, identify bottlenecks, and track improvements over time. For GraphRAG endpoints, Observal runs RAGAS evaluation, computing faithfulness, answer relevancy, context precision, and context recall using LLM-as-judge on retrieval spans.
What It Covers
Observal manages 8 registry types that cover the full surface area of modern AI-assisted development:
| Registry Type | What It Is | What Observal Measures | |--------------|-----------|----------------------| | MCP Servers | Model Context Protocol servers that expose tools to agents | Call volume, latency percentiles, error rates, schema compliance | | Agents | AI agent configurations with system prompts, model settings, and linked tools | Interaction count, acceptance rate, tool call efficiency, version-over-version comparison | | Tool Calls | Standalone tools (non-MCP) exposed directly to agents | Invocation count, success rate, retry rate, schema validation | | Skills | Portable instruction packages (SKILL.md) that agents load on demand | Activation frequency (auto vs manual), error rate correlation, session duration impact | | Hooks | Lifecycle callbacks that fire at specific points during agent sessions | Execution count per event type, block rate, latency overhead | | Prompts | Managed prompt templates with variable substitution | Render count, token expansion ratio, downstream LLM success rate | | Sandbox Exec | Docker/LXC execution environments for code running and testing | CPU/memory/disk/network usage, exit codes, OOM rate, timeout rate | | GraphRAGs | Knowledge graph and RAG system endpoints | Entities retrieved, relationships traversed, relevance scores, embedding latency, RAGAS evaluation (faithfulness, answer relevancy, context precision, context recall) |
Every type emits telemetry into ClickHouse. Every type gets metrics, feedback, and eval scores. Admin review controls visibility in the public registry, but you can use your own items and collect telemetry immediately, no approval needed.
IDE Support
Config generation and telemetry collection work across all major agentic IDEs:
| IDE | MCP | Agents | Skills | Hooks | Sandbox Exec | GraphRAGs | Prompts | Native OTel | |-----|:---:|:------:|:------:|:-----:|:------------:|:---------:|:-------:|:-----------:| | Claude Code | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | | Codex CLI | Yes | Yes | Yes | - | Yes | Yes | Yes | Yes | | Gemini CLI | Yes | Yes | Yes | - | Yes | Yes | Yes | Yes | | GitHub Copilot | - | - | Yes | - | - | - | Yes | Yes | | Kiro IDE | Yes | Yes | Yes | Yes | Yes | Yes | Yes | - | | Kiro CLI | Yes | Yes | Yes | Yes | Yes | Yes | Yes | - | | Cursor | Yes | Yes | Yes | Yes | Yes | Yes | Yes | - | | VS Code | Yes | Yes | - | - | Yes | Yes | Yes | - | | Windsurf | Yes | Yes | - | - | Yes | Yes | Yes | - |
IDEs with Native OTel support send full distributed traces, user prompts, LLM token usage, and tool execution telemetry directly to Observal via OpenTelemetry. This is configured automatically when you run observal install. IDEs without native OTel support use the observal-shim transparent proxy for MCP tool call telemetry.
Tech Stack
| Component | Technology | |-----------|------------| | Frontend | Next.js 16, React 19, Tailwind CSS 4, shadcn/ui, Recharts | | Backend API | Python, FastAPI, Uvicorn | | Database | PostgreSQL 16 (primary), ClickHouse (telemetry) | | ORM | SQLAlchemy (async) + AsyncPG | | CLI | Python, Typer, Rich | | Eval Engine | AWS Bedrock / OpenAI-compatible LLMs | | Background Jobs | arq + Redis | | Real-time | GraphQL subscriptions (Strawberry + WebSocket) | | Dependency Management | uv | | Deployment | Docker Compose |
Setup & Configuration
For detailed setup, eval engine configuration, environment variables, and troubleshooting, see SETUP.md.
<details> <summary><strong>CLI Usage</strong></summary>Authentication
observal init # first-time admin setup
observal login # login with API key
observal whoami # check current user
Quick Start with Existing Setup
observal scan # detect and instrument all IDE configs in current directory
observal scan --ide cursor # target specific IDE
observal scan --dry-run # preview changes without modifying files
observal scan /path/to/project --yes # non-interactive
Registry Operations
All registry types follow the same pattern: submit, list, show, install, delete. All commands accept either an ID or a name.
# MCP Servers (ID or name works for all commands)
observal submit <git-url>
observal list [--category <cat>] [--search <term>]
observal show <id-or-name>
observal install <id-or-name> --ide <ide>
# Agents
observal agent create
observal agent list [--search <term>]
observal agent show <id>
observal agent install <id> --ide <ide>
# Skills
observal skill submit <git-url-or-path>
observal skill list [--task-type <type>] [--target-agent <agent>]
observal skill install <id> --ide <ide>
# Hooks
observal hook submit
observal hook list [--event <event>] [--scope <scope>]
observal hook install <id> --ide <ide>
# Tools
observal tool submit
observal tool list [--category <cat>]
observal tool install <id> --ide <ide>
# Prompts
observal prompt submit [--from-file <path>]
observal prompt list [--category <cat>]
observal prompt render <id> --var key=value
# Sandboxes
observal sandbox submit
observal sandbox list [--runtime docker|lxc]
observal sandbox install <id> --ide <ide>
# GraphRAGs
observal graphrag submit
observal graphrag list [--query-interface graphql|rest|cypher|sparql]
observal graphrag install <id> --ide <ide>
Admin Review
All registry types go through a single review workflow:
observal review list [--type mcp|agent|skill|hook|tool|prompt|sandbox|graphrag]
observal review show <id>
observal review approve <id>
observal review reject <id> --reason "Missing documentation"
Observability
# Telemetry status
observal telemetry status
# Metrics for any registry type
observal metrics <id> --type mcp
observal metrics <id> --type agent
observal metrics <id> --type tool
# Enterprise overview
observal overview
Evaluation
# Run eval on agent traces
observal eval run <agent-id>
# List and inspect scorecards
observal eval scorecards <agent-id> [--version "1.0.0"]
observal eval show <scorecard-id>
# Compare versions
observal eval compare <agent-id> --a "1.0.0" --b "2.0.0"
Related Skills
tmux
352.5kRemote-control tmux sessions for interactive CLIs by sending keystrokes and scraping pane output.
diffs
352.5kUse the diffs tool to produce real, shareable diffs (viewer URL, file artifact, or both) instead of manual edit summaries.
terraform-provider-genesyscloud
Terraform Provider Genesyscloud
blogwatcher
352.5kMonitor blogs and RSS/Atom feeds for updates using the blogwatcher CLI.
