RoboSystems

RoboSystems is a financial intelligence platform that connects disparate data sources, builds domain ontologies as knowledge graphs, and provides AI-powered tools for accounting, financial reporting, investment management, and analysis. It powers RoboLedger and RoboInvestor.

LadybugDB Graph Database: Embedded columnar graph database with native DuckDB staging, LanceDB vector search, and tiered infrastructure
Extensions: Domain schemas that drive OLTP tables, API routes, data pipelines, and dedicated frontend apps. Extensions share a single database with schema-per-tenant isolation and materialize to the graph
Document Search: Full-text and semantic search across SEC filings, uploaded documents, and connected sources via OpenSearch
AI-Native Architecture: Context graphs with embeddings, semantic enrichment, and confidence scoring for LLM-powered analytics
Model Context Protocol (MCP): Standardized server and client for LLM integration with schema-aware tools
Multi-Source Data Integration: SEC XBRL filings, QuickBooks accounting data via dbt pipelines, and custom financial datasets
Enterprise-Ready Infrastructure: Multi-tenant architecture with tiered scaling and production-grade query management
Developer-First API: RESTful API designed for integration with financial applications

Platform

The platform provides the core infrastructure that all extensions build on:

Dedicated Infrastructure: Tiered graph infrastructure with dedicated instances and configurable memory allocation
AI Agent System: Autonomous financial operations — graph queries, taxonomy mapping, report generation — with automatic credit tracking and SSE progress streaming
Shared Repositories: SEC XBRL filings knowledge graph for context mining and benchmarking
Document Management: Upload, index, and search documents with full-text and semantic search via OpenSearch
DuckDB Staging System: High-performance data validation and bulk ingestion pipeline
Dagster Orchestration: Data pipeline orchestration for SEC filings, QuickBooks sync, backups, billing, and scheduled jobs
Credit-Based Billing: Flexible credits for AI operations based on token usage
Subgraphs (Workspaces): AI memory graphs, data workspaces with fork & publish, and isolated environments for development and team collaboration

Extensions

Each extension defines a domain schema and provides OLTP tables, API routes, data pipelines, and a dedicated frontend app. All extensions share a single PostgreSQL database with schema-per-tenant isolation and materialize to the graph. See Schema Extensions for details.

RoboLedger

Accounting and financial reporting extension. OLTP general ledger with schema-per-tenant PostgreSQL (accounts, transactions, journal entries, line items, dimensions), QuickBooks ELT pipeline via dbt/Dagster, SEC XBRL financial reporting, and chart of accounts.

RoboInvestor

Portfolio management and investment tracking extension with securities, positions, trades, benchmarks, market data, and risk. Dedicated frontend app. OLTP database and API routes planned.

Quick Start

Docker Development Environment

# Install uv and just
brew install uv just

# Start robosystems backend api
just start

# Start frontend apps - robosystems-app, roboledger-app, roboinvestor-app
just start apps

This initializes the .env file and starts the complete RoboSystems stack with:

Graph API with LadybugDB and DuckDB backends
Dagster for data pipeline orchestration
PostgreSQL for graph metadata, IAM and Dagster
Valkey for caching, SSE messaging, and rate limiting
OpenSearch for full-text and semantic document search
Localstack for S3 and DynamoDB emulation

Service URLs:

| Service | URL | | ---------- | --------------------- | | Main API | http://localhost:8000 | | Graph API | http://localhost:8001 | | Dagster UI | http://localhost:8002 |

With just start apps (frontend apps):

| App | URL | | ---------------- | --------------------- | | RoboSystems App | http://localhost:3000 | | RoboLedger App | http://localhost:3001 | | RoboInvestor App | http://localhost:3002 |

Local Development

# Setup Python environment (uv automatically handles Python versions)
just init

Examples

See RoboSystems in action with runnable demos that create graphs, load data, and execute queries with the robosystems-client:

just demo-sec               # Loads NVIDIA's SEC XBRL data via Dagster pipeline
just demo-custom-graph      # Builds custom graph schema with relationship networks

Each demo has a corresponding Wiki article with detailed guides.

Development Commands

Testing

just test-all               # Tests with code quality
just test                   # Default test suite
just test adapters          # Test specific module
just test-cov               # Tests with coverage

Log Monitoring

just logs api                 # View API logs (last 100 lines)
just logs graph-api           # View Graph API logs (last 100 lines)
just logs dagster-webserver   # View Dagster Webserver logs
just logs dagster-daemon      # View Dagster Daemon logs

See justfile for 50+ development commands including database migrations, CloudFormation linting, graph operations, administration, and more.

Prerequisites

System Requirements

Docker & Docker Compose
8GB RAM minimum
20GB free disk space

Required Tools

uv for Python package and version management
just for project command runner

Deployment Requirements

Fork this repo
AWS account with IAM Identity Center (SSO)
Run just bootstrap to configure OIDC and GitHub variables

See the Bootstrap Guide for complete instructions.

Architecture

RoboSystems is built on a modern, scalable architecture with:

Application Layer:

FastAPI REST API with versioned endpoints
Extension API routes feature-flagged per module
MCP Server for AI-powered graph database access with schema-aware tools
AI Agent System for autonomous financial operations with automatic credit tracking
Dagster for data pipeline orchestration and background jobs

LadybugDB Graph Database: (configuration)

Embedded columnar graph database purpose-built for financial analytics
Base + extension schema architecture — extensions define domain models
Native DuckDB integration for high-performance staging and ingestion
LanceDB vector search for semantic element resolution (IVF-PQ indexes, 384-dim embeddings)
Tiered infrastructure with configurable memory, rate limits, and subgraph allocations
Shared tier hosts public repositories with read replicas

Data Layer:

PostgreSQL for IAM, graph metadata, Dagster, and extension OLTP databases (schema-per-tenant)
OpenSearch for full-text and semantic document search (BM25 + KNN)
Valkey for caching, SSE messaging, and rate limiting
AWS S3 for data lake storage and static assets
DynamoDB for instance/graph/volume registry

Infrastructure:

ECS Fargate for API and Dagster
EC2 ASG for LadybugDB writer clusters
EC2 ALB + ASG for LadybugDB shared replica clusters
RDS PostgreSQL + ElastiCache Valkey
OpenSearch for full-text and semantic document search
CloudFormation infrastructure deployed via GitHub Actions with OIDC

For detailed architecture documentation, see the Architecture Overview in the Wiki.

SEC Shared Repository

A curated knowledge graph of US public company financial data from SEC EDGAR XBRL filings. Runs on the shared LadybugDB tier, accessible via MCP tools, Cypher queries, and the AI agent.

Pipeline: EDGAR → Download → Process (Parquet) → Stage (DuckDB) → Enrich (fastembed) → Materialize (LadybugDB) → Index + Embed (OpenSearch)
Graph: 14 node types and 24 relationship types modeling the full XBRL reporting hierarchy
Search: Hybrid BM25 + KNN vector search across XBRL text blocks, narrative sections, and iXBRL disclosures
Enrichment: Semantic element mapping, statement classification, and disclosure tagging via the Seattle Method taxonomy

just sec-load NVDA 2025  # Load NVIDIA filings for 2025
just sec-health          # Check SEC database health

See SEC Adapter and SEC Pipeline for detailed documentation.

AI

Model Context Protocol (MCP)

Financial Analysis: Natural language queries across enterprise data and public benchmark data
Cross-Database Queries: Compare user graph data against SEC shared repository data
Tools: Rich toolkit for graph queries, schema introspection, fact discovery, financial analysis, document search, and AI memory operations
Handler Pool: Managed MCP handler instances with resource limits

Agent System

Unified architecture: stateless agents with protocol-based service injection
Dual execution: API (sync/SSE) and background worker (Valkey queue + SSE progress)
Automatic credit tracking per AI call — agents cannot forget billing
Extensible: new agents implement run(ctx) and register with a decorator
See Agent README for details

Credit System

AI Operations Only: Credits are consumed exclusively by AI agent calls (Anthropic Claude via AWS Bedrock)
**Token-Bas

Robosystems

Install / Use

README