Obai
OBaI is an open-source agentic quant platform stock research, strategy analysis, and backtesting.
Install / Use
/learn @sixteen-dev/ObaiQuality Score
Category
Development & EngineeringSupported Platforms
README
Quick Demo
<div align="center"> <sub><i>"I want to fade the opening spike on NVDA. Backtest a 5-min mean-reversion on the first hour — enter when RSI(14) drops below 25 after 9:45, exit on RSI crossing back above 50, flat by close. Tight stop, 1.5% max. Last year of data."</i></sub> <br/><br/> <a href="https://youtu.be/62E0JBasyCQ"> <img src="https://img.youtube.com/vi/62E0JBasyCQ/maxresdefault.jpg" alt="OBaI Demo" width="720" /> </a> </div>The Central Hub understands your intent, dispatches to the right specialists simultaneously (agents-as-tools pattern, not handoffs), and merges everything into one coherent answer.
Architecture
The Hub receives a query, runs input guardrails, then dispatches to multiple specialists in parallel (agents-as-tools pattern, not handoffs). Each agent calls its MCP server over streamable-http. Results flow back to the synthesizer. Opik (self-hosted) traces every span end-to-end and scores the final output. Strategy Agent uses gpt-5.1 for stronger reasoning; all others use gpt-5-mini. The Research Agent adds deep qualitative analysis via Exa semantic search — company profiles, leadership, product sentiment, and competitive landscape.
Why These Data Providers
| Provider | Cost | Coverage | |----------|------|----------| | FMP (Financial Modeling Prep) | ~$19/mo | Fundamentals, market data, screening, portfolio, earnings, dividends, backtest OHLCV. One API covers 6 of 8 servers. | | Massive.com | Free tier available | Options chain data, Greeks, implied volatility, open interest. | | Tavily | Free tier available | AI-optimized news search. Purpose-built for LLM consumption. | | Exa | Free tier available | Semantic search for qualitative research — company profiles, leadership, product sentiment, competitive landscape. |
FMP is the backbone -- it is not free, but a single subscription powers almost the entire system.
Prerequisites
- Python 3.12+
- uv -- install uv
- Docker + Docker Compose v2 -- Docker Engine (Linux) or Docker Desktop (macOS/Windows)
API Keys Required
| Key | Provider | Cost | Used By |
|-----|----------|------|---------|
| OPENAI_API_KEY | OpenAI | Pay-per-use | All agents (Agent SDK) |
| FMP_API_KEY | Financial Modeling Prep | ~$19/mo | fundamentals, market-data, events-news, screening, portfolio, backtest servers |
| MASSIVE_API_KEY | Massive.com | Free tier | options-server only |
| TAVILY_API_KEY | Tavily | Free tier | events-news-server (AI search) |
| EXA_API_KEY | Exa | Free tier | research-server (semantic search) |
| ANTHROPIC_API_KEY | Anthropic | Pay-per-use | Optional -- LLM-judge cross-family evaluation only |
Quick Start
git clone https://github.com/sixteen-dev/obai.git
cd obai
# Set your API keys (add to ~/.bashrc or ~/.zshrc for persistence)
export OPENAI_API_KEY=sk-proj-...
export FMP_API_KEY=...
export MASSIVE_API_KEY=... # optional
export TAVILY_API_KEY=... # optional
export EXA_API_KEY=... # optional
# One-shot setup: checks prereqs, starts Docker services, installs CLI
./setup.sh
# Start chatting
obai chat
The setup script:
- Checks prerequisites (Docker, Python 3.12+, uv, git)
- Validates required API keys from your shell environment
- Creates
~/.obai/config directory with default preferences - Starts Opik tracing stack (self-hosted, Docker Compose)
- Builds and starts all 8 MCP servers (Docker Compose)
- Installs the
obaiCLI globally viauv tool install - Configures Opik SDK for local tracing
Use ./setup.sh --skip-opik to skip the tracing stack, or ./setup.sh --skip-mcp to skip MCP servers.
CLI Usage
# Single query (streams response to stdout)
obai query "What is AAPL trading at?"
# JSON output (for piping to other tools)
obai query "AAPL fundamentals" --json
# Named session for multi-turn conversation
obai query "What is AAPL's P/E ratio?" --session research1
obai query "How does that compare to MSFT?" --session research1
# Interactive REPL
obai chat
# Check MCP server connectivity
obai status
MCP Servers
| Server | Port | Data Source | Key Capabilities | |--------|------|-------------|-----------------| | fundamentals-server | 8001 | FMP + Qdrant | Company financials, ratios, SEC filings, insider trades, vector search over financial education PDFs | | market-data-server | 8002 | FMP | Real-time/historical prices, intraday data (5min/15min/1hr), technical indicators | | events-news-server | 8003 | FMP + Tavily | Earnings calendar, dividends, AI-powered news search | | options-server | 8004 | Massive.com | Options chains, Greeks, implied volatility, open interest | | screening-server | 8005 | FMP | Stock screening with financial filters, ticker discovery | | portfolio-server | 8006 | FMP | Portfolio parsing, risk analysis, ETF holdings, treasury rates | | backtest-server | 8007 | FMP | Strategy backtesting with Polars + polars-talib, DuckDB storage, daily + intraday (5min/15min/1hr), train/test split | | research-server | 8008 | Exa | Deep qualitative research — company profiles, leadership, product sentiment, competitive landscape, general research |
All servers use FastMCP with streamable-http transport, running inside Docker containers on a shared bridge network (obai-mcp-network).
Strategy Agent
The Strategy Agent is OBaI's quantitative researcher. Unlike other specialists that answer questions, the Strategy Agent builds, tests, and iterates on trading strategies autonomously.
How it works: You describe a hypothesis ("momentum strategy for AAPL and MSFT") and the agent:
- Converts your idea into a structured strategy JSON (indicators, entry/exit rules, position sizing, risk management)
- Runs a backtest via the backtest-server (Polars + polars-talib engine, DuckDB storage)
- Analyzes results (Sharpe, Sortino, CAGR, max drawdown, win rate, profit factor)
- Iterates 3-5 times — adding filters, tuning parameters, refining exits
- Validates the final candidate on out-of-sample data (train/test split)
- Returns a verdict (
accept,paper_trade,needs_more_research,reject) with the executable strategy JSON
> Design a mean-reversion strategy for AAPL, MSFT, and GOOGL
Strategy Agent workflow:
Iteration 1: RSI oversold baseline → Sharpe 0.82
Iteration 2: Add Bollinger Band filter → Sharpe 1.14
Iteration 3: Tighten stop-loss from 5%→3% → Sharpe 1.21, drawdown -8.2%
Iteration 4: Parameter sensitivity check → stable across ±10% range
Iteration 5: Full-period validation → Sharpe 1.08 (minor degradation, acceptable)
Verdict: paper_trade
Final strategy JSON: { ... }
The agent uses gpt-5.1 by default (not gpt-5-mini like other specialists) because strategy design requires strong reasoning — metric interpretation, overfitting detection, and parameter sensitivity analysis.
Backtest server tools: run_strategy, get_job_status, get_supported_indicators, download_data, list_available_data, get_trade_log, compare_strategies, clear_cache
TUI
OBaI includes a Textual-based Terminal UI with:
- Collapsible conversation history
- Hierarchical tool call display (see which agents were invoked)
- Streaming markdown responses
- Toggle-able debug panel
# From repo root
cd src/obai
uv run python -m clients.cli.tui
Observability & Evaluation
OBaI uses Opik (self-hosted, open source) for end-to-end tracing and evaluation. Every query generates a full trace you can inspect in the Opik UI at http://localhost:5173.
What Opik Shows You
Each trace captures the complete execution graph:
- Agent routing — which specialists the Hub dispatched to and why
- Tool calls — every MCP tool invoked (function name, arguments, response), nested under the agent that called it
- Timing — latency breakdown per agent and per tool call, so you can spot bottlenecks
- Token usage — input/output tokens per LLM call across the entire query
- Span hierarchy — Hub → Agent → MCP Tool, fully nested and expandable
Custom Evaluation Metrics
OBaI registers custom scorers with Opik that run on every traced query:
| Scorer | What it measures | How it works |
|--------|-----------------|--------------|
| Faithfulness | Is the response grounded in tool outputs? | Extracts numbers from the final response and cross-checks against raw MCP tool data. Reports numeric_accuracy (0-1) and a pass/fail verdict. |
| Completeness | Does the response address the full query? | Checks coverage of available data points from tool outputs that should appear in the answer. Reports coverage_score (0-1). |
| LLM Judge | Overall quality assessment | Cross-family evaluation using Anthropic Claude as judge (requires ANTHROPIC_API_KEY). Scores task completion, tool correctness, hallucination, and answer relevance. |
Running Evaluations
# From repo root
cd src/obai
# Trace a single query (inspect in Opik UI afterward)
uv run python -m evaluation query "What is AAPL trading at?" --verbose
# Run evaluation with all scorers
uv run python -m evaluation evaluate "What is AAPL trading at?"
# Run the full test suite (139 cases, categories: A/B/C/D/E/G)
uv run python -m evaluation evaluate --suite
# Fast mode — skip LLM judge, just faithfulne
