Orionbelt Analytics
Ontology-based MCP server that analyzes database schemas (PostgreSQL, Snowflake, ClickHouse, Dremio) and generates RDF/OWL ontologies with SQL mappings for fan-trap-free Text-to-SQL.
Install / Use
/learn @ralfbecher/Orionbelt AnalyticsQuality Score
Category
Development & EngineeringSupported Platforms
README
This project provides a production-ready Python-based MCP (Model Context Protocol) server that analyzes relational database schemas (PostgreSQL, MySQL, Snowflake, ClickHouse, Dremio, BigQuery, DuckDB/MotherDuck, and Databricks SQL) and automatically generates comprehensive ontologies in RDF/Turtle format with direct SQL mappings.
Better Together: Combine with OrionBelt Semantic Layer for a complete AI-powered analytics stack. The Semantic Layer compiles declarative YAML models into dialect-specific, optimized SQL — ensuring correct joins, aggregations, and fan-trap-free queries across PostgreSQL, MySQL, Snowflake, ClickHouse, Dremio, BigQuery, DuckDB, and Databricks. Run both MCP servers side-by-side in Claude Desktop for schema-aware ontology generation and guaranteed-correct SQL compilation.
GraphRAG: The Foundation for OBML Model Creation
OrionBelt Analytics provides the schema intelligence that powers semantic modeling:
When creating OBML models (OrionBelt Modeling Language) for the Semantic Layer, users and LLMs need to understand database structure, relationships, and join paths. GraphRAG serves as the foundational discovery engine that makes intelligent semantic modeling possible.
The Workflow:
-
Schema Discovery (Analytics + GraphRAG)
analyze_schema()automatically initializes GraphRAG- Graph structure built from foreign keys (up to 12 hops)
- Vector embeddings created for semantic similarity search
-
OBML Model Creation (Analytics + GraphRAG + LLM)
- LLM queries GraphRAG: "What tables contain revenue data?"
- LLM queries GraphRAG: "How do customers connect to orders?"
- LLM creates OBML model with metrics, dimensions, and join paths
- User/LLM loads OBML into Semantic Layer
-
Business Queries (Semantic Layer)
- Natural language queries via pre-built OBML models
- Guaranteed-correct SQL compilation with fan-trap prevention
-
Ad-hoc Technical Queries (Analytics + GraphRAG)
- Direct SQL execution for queries not in OBML model
- GraphRAG-assisted discovery for new relationships
Why GraphRAG is Essential:
- ✅ Zero-setup intelligence - No manual modeling required for schema discovery
- ✅ Intelligent relationship discovery - Graph algorithms find complex join paths automatically
- ✅ Semantic similarity - Vector search identifies related concepts across tables
- ✅ Foundation for OBML - Provides the raw intelligence needed to create accurate semantic models
Without GraphRAG: Creating OBML models requires manual schema inspection, tedious relationship discovery, and extensive user guidance.
With GraphRAG: LLMs can intelligently discover schema structure and create accurate OBML models automatically, while still supporting ad-hoc technical queries that fall outside the semantic layer.
Key Philosophy: Automatic Ontology Integration
Our main analysis tool get_analysis_context() automatically includes ontology generation, making semantic context readily available for every query.
Architecture Overview
<p align="center"> <img src="assets/architecture.png" alt="OrionBelt Analytics Architecture" width="900"> </p>OrionBelt Analytics combines multiple AI-powered technologies to provide intelligent database understanding:
- GraphRAG - Graph-based relationship discovery with 12-hop traversal
- Vector Search - Semantic similarity via ChromaDB embeddings
- RDF Store - Persistent ontology storage with SPARQL query interface
- Connection Scoping - Isolated storage per database connection
- FastMCP - Model Context Protocol server for Claude Desktop integration
🌟 Key Features
🔗 Database Connectivity
- PostgreSQL, MySQL, Snowflake, ClickHouse, Dremio, BigQuery, DuckDB/MotherDuck, and Databricks SQL support with connection pooling
- Environment variable fallback - parameters optional, uses .env when not provided
- Enhanced connection management with retry logic and timeout handling
- Automatic dependency management for all database connectors
- Cloud-native support - BigQuery, MotherDuck, and Databricks SQL for cloud data warehouses
- Local analytics - DuckDB for fast local OLAP workloads
🎯 13 Essential Tools
- Streamlined workflow with focused, purpose-built tools
- Interactive charting (
generate_chart) with direct image rendering - Comprehensive schema analysis with automatic ontology generation
- Semantic name resolution for business-friendly ontologies
- Custom ontology loading from external files
- Built-in workflow guidance via FastMCP Context integration
- Focus on results - maximum effectiveness with minimum complexity
🧠 Automatic Ontology Generation
- Self-sufficient ontologies with direct database references (
db:sqlReference,db:sqlJoinCondition) - Business context inference from table and column naming patterns
- Complete SQL mappings embedded directly in ontology
- Fan-trap detection and query safety validation
🗺️ R2RML Mapping Generation
- W3C-compliant R2RML mappings auto-generated alongside schema analysis
- SQL query templates with
rr:sqlQueryandrr:sqlVersion rr:SQL2008 - XSD datatype mapping from SQL types to RDF datatypes
- Configurable base IRI via environment variable (
R2RML_BASE_IRI)
🔍 SPARQL Query Support
- Persistent RDF store via Oxigraph with SPARQL 1.1 support
- Semantic schema queries using the
db:namespace ontology annotations - Connection-scoped storage - each database gets isolated RDF store
- 7 SPARQL tools for semantic exploration:
query_sparql()- Execute custom SPARQL SELECT queriesquery_sparql_ask()- Execute SPARQL ASK queries (true/false)list_tables_sparql()- List tables via semantic queryfind_columns_by_type_sparql()- Find columns by data typeadd_rdf_knowledge()- Add custom metadata triplesstore_ontology_in_rdf()- Persist ontologies for SPARQL accessget_rdf_store_stats()- RDF store statistics
Key ontology properties for SPARQL queries:
db:sqlReference- Maps properties to SQL columns (e.g.,"customers.customer_id")db:sqlJoinCondition- Stores JOIN conditions (e.g.,"orders.customer_id = customers.customer_id")db:hasColumn,db:columnName,db:dataType- Column metadatadb:tableName,db:relationshipType- Schema structure
Example SPARQL query:
PREFIX db: <http://example.com/db#>
SELECT ?table ?column ?dataType
WHERE {
?table db:hasColumn ?column .
?column db:columnName ?columnName .
?column db:dataType ?dataType .
FILTER(CONTAINS(LCASE(?columnName), "revenue"))
}
🛡️ Advanced SQL Safety
- Fan-trap prevention protocols with mandatory relationship analysis
- Query pattern validation to prevent data multiplication errors
- Safe aggregation patterns (UNION, separate CTEs, window functions)
- Comprehensive SQL validation before execution
⚡ Performance & Reliability
- Concurrent processing with thread pool management
- Connection pooling and resource optimization
- Comprehensive error handling with structured responses
- Production-ready logging and monitoring
Installation
# Install all dependencies (recommended)
uv sync
Key dependencies: FastMCP, SQLAlchemy, database connectors (PostgreSQL, MySQL, Snowflake, ClickHouse, Dremio, BigQuery, DuckDB, Databricks), RDFLib, Pydantic
Project Structure
orionbelt-analytics/
├── src/
│ ├── __init__.py # Package initialization
│ ├── main.py # FastMCP server entry point (13 tools)
│ ├── database_manager.py # Database connection and analysis
│ ├── ontology_generator.py # RDF ontology generation with SQL mappings
│ ├── r2rml_generator.py # W3C R2RML mapping generation
│ ├── dremio_client.py # Dremio database client
│ ├── security.py # Security and validation utilities
│ ├── chart_u
