Robosystems
RoboSystems is a financial intelligence platform that unifies structured data, document search, and AI memory to transform complex financial data into actionable intelligence. Fork-ready with full GitHub Actions CI/CD for deploying CloudFormation infrastructure to your AWS account.
Install / Use
/learn @RoboFinSystems/RobosystemsQuality Score
Category
Development & EngineeringSupported Platforms
README
RoboSystems
RoboSystems is a financial intelligence platform that connects disparate data sources, builds domain ontologies as knowledge graphs, and provides AI-powered tools for accounting, financial reporting, investment management, and analysis. It powers RoboLedger and RoboInvestor.
- LadybugDB Graph Database: Embedded columnar graph database with native DuckDB staging, LanceDB vector search, and tiered infrastructure
- Extensions: Domain schemas that drive OLTP tables, API routes, data pipelines, and dedicated frontend apps. Extensions share a single database with schema-per-tenant isolation and materialize to the graph
- Document Search: Full-text and semantic search across SEC filings, uploaded documents, and connected sources via OpenSearch
- AI-Native Architecture: Context graphs with embeddings, semantic enrichment, and confidence scoring for LLM-powered analytics
- Model Context Protocol (MCP): Standardized server and client for LLM integration with schema-aware tools
- Multi-Source Data Integration: SEC XBRL filings, QuickBooks accounting data via dbt pipelines, and custom financial datasets
- Enterprise-Ready Infrastructure: Multi-tenant architecture with tiered scaling and production-grade query management
- Developer-First API: RESTful API designed for integration with financial applications
Platform
The platform provides the core infrastructure that all extensions build on:
-
Dedicated Infrastructure: Tiered graph infrastructure with dedicated instances and configurable memory allocation
-
AI Agent System: Autonomous financial operations — graph queries, taxonomy mapping, report generation — with automatic credit tracking and SSE progress streaming
-
Shared Repositories: SEC XBRL filings knowledge graph for context mining and benchmarking
-
Document Management: Upload, index, and search documents with full-text and semantic search via OpenSearch
-
DuckDB Staging System: High-performance data validation and bulk ingestion pipeline
-
Dagster Orchestration: Data pipeline orchestration for SEC filings, QuickBooks sync, backups, billing, and scheduled jobs
-
Credit-Based Billing: Flexible credits for AI operations based on token usage
-
Subgraphs (Workspaces): AI memory graphs, data workspaces with fork & publish, and isolated environments for development and team collaboration
Extensions
Each extension defines a domain schema and provides OLTP tables, API routes, data pipelines, and a dedicated frontend app. All extensions share a single PostgreSQL database with schema-per-tenant isolation and materialize to the graph. See Schema Extensions for details.
RoboLedger
Accounting and financial reporting extension. OLTP general ledger with schema-per-tenant PostgreSQL (accounts, transactions, journal entries, line items, dimensions), QuickBooks ELT pipeline via dbt/Dagster, SEC XBRL financial reporting, and chart of accounts.
RoboInvestor
Portfolio management and investment tracking extension with securities, positions, trades, benchmarks, market data, and risk. Dedicated frontend app. OLTP database and API routes planned.
Quick Start
Docker Development Environment
# Install uv and just
brew install uv just
# Start robosystems backend api
just start
# Start frontend apps - robosystems-app, roboledger-app, roboinvestor-app
just start apps
This initializes the .env file and starts the complete RoboSystems stack with:
- Graph API with LadybugDB and DuckDB backends
- Dagster for data pipeline orchestration
- PostgreSQL for graph metadata, IAM and Dagster
- Valkey for caching, SSE messaging, and rate limiting
- OpenSearch for full-text and semantic document search
- Localstack for S3 and DynamoDB emulation
Service URLs:
| Service | URL | | ---------- | --------------------- | | Main API | http://localhost:8000 | | Graph API | http://localhost:8001 | | Dagster UI | http://localhost:8002 |
With just start apps (frontend apps):
| App | URL | | ---------------- | --------------------- | | RoboSystems App | http://localhost:3000 | | RoboLedger App | http://localhost:3001 | | RoboInvestor App | http://localhost:3002 |
Local Development
# Setup Python environment (uv automatically handles Python versions)
just init
Examples
See RoboSystems in action with runnable demos that create graphs, load data, and execute queries with the robosystems-client:
just demo-sec # Loads NVIDIA's SEC XBRL data via Dagster pipeline
just demo-custom-graph # Builds custom graph schema with relationship networks
Each demo has a corresponding Wiki article with detailed guides.
Development Commands
Testing
just test-all # Tests with code quality
just test # Default test suite
just test adapters # Test specific module
just test-cov # Tests with coverage
Log Monitoring
just logs api # View API logs (last 100 lines)
just logs graph-api # View Graph API logs (last 100 lines)
just logs dagster-webserver # View Dagster Webserver logs
just logs dagster-daemon # View Dagster Daemon logs
See justfile for 50+ development commands including database migrations, CloudFormation linting, graph operations, administration, and more.
Prerequisites
System Requirements
- Docker & Docker Compose
- 8GB RAM minimum
- 20GB free disk space
Required Tools
uvfor Python package and version managementjustfor project command runner
Deployment Requirements
- Fork this repo
- AWS account with IAM Identity Center (SSO)
- Run
just bootstrapto configure OIDC and GitHub variables
See the Bootstrap Guide for complete instructions.
Architecture
RoboSystems is built on a modern, scalable architecture with:
Application Layer:
- FastAPI REST API with versioned endpoints
- Extension API routes feature-flagged per module
- MCP Server for AI-powered graph database access with schema-aware tools
- AI Agent System for autonomous financial operations with automatic credit tracking
- Dagster for data pipeline orchestration and background jobs
LadybugDB Graph Database: (configuration)
- Embedded columnar graph database purpose-built for financial analytics
- Base + extension schema architecture — extensions define domain models
- Native DuckDB integration for high-performance staging and ingestion
- LanceDB vector search for semantic element resolution (IVF-PQ indexes, 384-dim embeddings)
- Tiered infrastructure with configurable memory, rate limits, and subgraph allocations
- Shared tier hosts public repositories with read replicas
Data Layer:
- PostgreSQL for IAM, graph metadata, Dagster, and extension OLTP databases (schema-per-tenant)
- OpenSearch for full-text and semantic document search (BM25 + KNN)
- Valkey for caching, SSE messaging, and rate limiting
- AWS S3 for data lake storage and static assets
- DynamoDB for instance/graph/volume registry
Infrastructure:
- ECS Fargate for API and Dagster
- EC2 ASG for LadybugDB writer clusters
- EC2 ALB + ASG for LadybugDB shared replica clusters
- RDS PostgreSQL + ElastiCache Valkey
- OpenSearch for full-text and semantic document search
- CloudFormation infrastructure deployed via GitHub Actions with OIDC
For detailed architecture documentation, see the Architecture Overview in the Wiki.
SEC Shared Repository
A curated knowledge graph of US public company financial data from SEC EDGAR XBRL filings. Runs on the shared LadybugDB tier, accessible via MCP tools, Cypher queries, and the AI agent.
- Pipeline: EDGAR → Download → Process (Parquet) → Stage (DuckDB) → Enrich (fastembed) → Materialize (LadybugDB) → Index + Embed (OpenSearch)
- Graph: 14 node types and 24 relationship types modeling the full XBRL reporting hierarchy
- Search: Hybrid BM25 + KNN vector search across XBRL text blocks, narrative sections, and iXBRL disclosures
- Enrichment: Semantic element mapping, statement classification, and disclosure tagging via the Seattle Method taxonomy
just sec-load NVDA 2025 # Load NVIDIA filings for 2025
just sec-health # Check SEC database health
See SEC Adapter and SEC Pipeline for detailed documentation.
AI
Model Context Protocol (MCP)
- Financial Analysis: Natural language queries across enterprise data and public benchmark data
- Cross-Database Queries: Compare user graph data against SEC shared repository data
- Tools: Rich toolkit for graph queries, schema introspection, fact discovery, financial analysis, document search, and AI memory operations
- Handler Pool: Managed MCP handler instances with resource limits
Agent System
- Unified architecture: stateless agents with protocol-based service injection
- Dual execution: API (sync/SSE) and background worker (Valkey queue + SSE progress)
- Automatic credit tracking per AI call — agents cannot forget billing
- Extensible: new agents implement
run(ctx)and register with a decorator - See Agent README for details
Credit System
- AI Operations Only: Credits are consumed exclusively by AI agent calls (Anthropic Claude via AWS Bedrock)
- **Token-Bas
