<div align="center"> <img src="assets/banner.png" alt="OBaI - Multi-agent AI for stock market research" width="600" /> <br/> <strong>Multi-agent AI system for stock market research, powered by GPT and real-time FMP custom MCP servers.</strong> <br/> <sub>pronounced <i>"oww-bee"</i></sub> </div>

Quick Demo

<div align="center"> <sub><i>"I want to fade the opening spike on NVDA. Backtest a 5-min mean-reversion on the first hour — enter when RSI(14) drops below 25 after 9:45, exit on RSI crossing back above 50, flat by close. Tight stop, 1.5% max. Last year of data."</i></sub> <br/><br/> <a href="https://youtu.be/62E0JBasyCQ"> <img src="https://img.youtube.com/vi/62E0JBasyCQ/maxresdefault.jpg" alt="OBaI Demo" width="720" /> </a> </div>

The Central Hub understands your intent, dispatches to the right specialists simultaneously (agents-as-tools pattern, not handoffs), and merges everything into one coherent answer.

Architecture

OBaI Architecture

The Hub receives a query, runs input guardrails, then dispatches to multiple specialists in parallel (agents-as-tools pattern, not handoffs). Each agent calls its MCP server over streamable-http. Results flow back to the synthesizer. Opik (self-hosted) traces every span end-to-end and scores the final output. Strategy Agent uses gpt-5.1 for stronger reasoning; all others use gpt-5-mini. The Research Agent adds deep qualitative analysis via Exa semantic search — company profiles, leadership, product sentiment, and competitive landscape.

Why These Data Providers

| Provider | Cost | Coverage | |----------|------|----------| | FMP (Financial Modeling Prep) | ~$19/mo | Fundamentals, market data, screening, portfolio, earnings, dividends, backtest OHLCV. One API covers 6 of 8 servers. | | Massive.com | Free tier available | Options chain data, Greeks, implied volatility, open interest. | | Tavily | Free tier available | AI-optimized news search. Purpose-built for LLM consumption. | | Exa | Free tier available | Semantic search for qualitative research — company profiles, leadership, product sentiment, competitive landscape. |

FMP is the backbone -- it is not free, but a single subscription powers almost the entire system.

Prerequisites

Python 3.12+
uv -- install uv
Docker + Docker Compose v2 -- Docker Engine (Linux) or Docker Desktop (macOS/Windows)

API Keys Required

| Key | Provider | Cost | Used By | |-----|----------|------|---------| | OPENAI_API_KEY | OpenAI | Pay-per-use | All agents (Agent SDK) | | FMP_API_KEY | Financial Modeling Prep | ~$19/mo | fundamentals, market-data, events-news, screening, portfolio, backtest servers | | MASSIVE_API_KEY | Massive.com | Free tier | options-server only | | TAVILY_API_KEY | Tavily | Free tier | events-news-server (AI search) | | EXA_API_KEY | Exa | Free tier | research-server (semantic search) | | ANTHROPIC_API_KEY | Anthropic | Pay-per-use | Optional -- LLM-judge cross-family evaluation only |

Quick Start

git clone https://github.com/sixteen-dev/obai.git
cd obai

# Set your API keys (add to ~/.bashrc or ~/.zshrc for persistence)
export OPENAI_API_KEY=sk-proj-...
export FMP_API_KEY=...
export MASSIVE_API_KEY=...     # optional
export TAVILY_API_KEY=...      # optional
export EXA_API_KEY=...         # optional

# One-shot setup: checks prereqs, starts Docker services, installs CLI
./setup.sh

# Start chatting
obai chat

The setup script:

Checks prerequisites (Docker, Python 3.12+, uv, git)
Validates required API keys from your shell environment
Creates ~/.obai/ config directory with default preferences
Starts Opik tracing stack (self-hosted, Docker Compose)
Builds and starts all 8 MCP servers (Docker Compose)
Installs the obai CLI globally via uv tool install
Configures Opik SDK for local tracing

Use ./setup.sh --skip-opik to skip the tracing stack, or ./setup.sh --skip-mcp to skip MCP servers.

CLI Usage

# Single query (streams response to stdout)
obai query "What is AAPL trading at?"

# JSON output (for piping to other tools)
obai query "AAPL fundamentals" --json

# Named session for multi-turn conversation
obai query "What is AAPL's P/E ratio?" --session research1
obai query "How does that compare to MSFT?" --session research1

# Interactive REPL
obai chat

# Check MCP server connectivity
obai status

MCP Servers

| Server | Port | Data Source | Key Capabilities | |--------|------|-------------|-----------------| | fundamentals-server | 8001 | FMP + Qdrant | Company financials, ratios, SEC filings, insider trades, vector search over financial education PDFs | | market-data-server | 8002 | FMP | Real-time/historical prices, intraday data (5min/15min/1hr), technical indicators | | events-news-server | 8003 | FMP + Tavily | Earnings calendar, dividends, AI-powered news search | | options-server | 8004 | Massive.com | Options chains, Greeks, implied volatility, open interest | | screening-server | 8005 | FMP | Stock screening with financial filters, ticker discovery | | portfolio-server | 8006 | FMP | Portfolio parsing, risk analysis, ETF holdings, treasury rates | | backtest-server | 8007 | FMP | Strategy backtesting with Polars + polars-talib, DuckDB storage, daily + intraday (5min/15min/1hr), train/test split | | research-server | 8008 | Exa | Deep qualitative research — company profiles, leadership, product sentiment, competitive landscape, general research |

All servers use FastMCP with streamable-http transport, running inside Docker containers on a shared bridge network (obai-mcp-network).

Strategy Agent

The Strategy Agent is OBaI's quantitative researcher. Unlike other specialists that answer questions, the Strategy Agent builds, tests, and iterates on trading strategies autonomously.

How it works: You describe a hypothesis ("momentum strategy for AAPL and MSFT") and the agent:

Converts your idea into a structured strategy JSON (indicators, entry/exit rules, position sizing, risk management)
Runs a backtest via the backtest-server (Polars + polars-talib engine, DuckDB storage)
Analyzes results (Sharpe, Sortino, CAGR, max drawdown, win rate, profit factor)
Iterates 3-5 times — adding filters, tuning parameters, refining exits
Validates the final candidate on out-of-sample data (train/test split)
Returns a verdict (accept, paper_trade, needs_more_research, reject) with the executable strategy JSON

> Design a mean-reversion strategy for AAPL, MSFT, and GOOGL

Strategy Agent workflow:
  Iteration 1: RSI oversold baseline         → Sharpe 0.82
  Iteration 2: Add Bollinger Band filter     → Sharpe 1.14
  Iteration 3: Tighten stop-loss from 5%→3%  → Sharpe 1.21, drawdown -8.2%
  Iteration 4: Parameter sensitivity check   → stable across ±10% range
  Iteration 5: Full-period validation        → Sharpe 1.08 (minor degradation, acceptable)

  Verdict: paper_trade
  Final strategy JSON: { ... }

The agent uses gpt-5.1 by default (not gpt-5-mini like other specialists) because strategy design requires strong reasoning — metric interpretation, overfitting detection, and parameter sensitivity analysis.

Backtest server tools: run_strategy, get_job_status, get_supported_indicators, download_data, list_available_data, get_trade_log, compare_strategies, clear_cache

TUI

OBaI includes a Textual-based Terminal UI with:

Collapsible conversation history
Hierarchical tool call display (see which agents were invoked)
Streaming markdown responses
Toggle-able debug panel

# From repo root
cd src/obai
uv run python -m clients.cli.tui

Observability & Evaluation

OBaI uses Opik (self-hosted, open source) for end-to-end tracing and evaluation. Every query generates a full trace you can inspect in the Opik UI at http://localhost:5173.

What Opik Shows You

Each trace captures the complete execution graph:

Agent routing — which specialists the Hub dispatched to and why
Tool calls — every MCP tool invoked (function name, arguments, response), nested under the agent that called it
Timing — latency breakdown per agent and per tool call, so you can spot bottlenecks
Token usage — input/output tokens per LLM call across the entire query
Span hierarchy — Hub → Agent → MCP Tool, fully nested and expandable

Custom Evaluation Metrics

OBaI registers custom scorers with Opik that run on every traced query:

| Scorer | What it measures | How it works | |--------|-----------------|--------------| | Faithfulness | Is the response grounded in tool outputs? | Extracts numbers from the final response and cross-checks against raw MCP tool data. Reports numeric_accuracy (0-1) and a pass/fail verdict. | | Completeness | Does the response address the full query? | Checks coverage of available data points from tool outputs that should appear in the answer. Reports coverage_score (0-1). | | LLM Judge | Overall quality assessment | Cross-family evaluation using Anthropic Claude as judge (requires ANTHROPIC_API_KEY). Scores task completion, tool correctness, hallucination, and answer relevance. |

Running Evaluations

# From repo root
cd src/obai

# Trace a single query (inspect in Opik UI afterward)
uv run python -m evaluation query "What is AAPL trading at?" --verbose

# Run evaluation with all scorers
uv run python -m evaluation evaluate "What is AAPL trading at?"

# Run the full test suite (139 cases, categories: A/B/C/D/E/G)
uv run python -m evaluation evaluate --suite

# Fast mode — skip LLM judge, just faithfulne

Obai

Install / Use

README