OpenChatBI

OpenChatBI is an open source, chat-based intelligent BI tool powered by large language models, designed to help users query, analyze, and visualize data through natural language conversations. Built on LangGraph and LangChain ecosystem, it provides chat agents and workflows that support natural language to SQL conversion and streamlined data analysis.

Join the Slack channel to discuss: https://join.slack.com/t/openchatbicommunity/shared_invite/zt-3jpzpx9mv-Sk88RxpO4Up0L~YTZYf4GQ

Core Features

Natural Language Interaction: Get data analysis results by asking questions in natural language
Automatic SQL Generation: Convert natural language queries into SQL statements using advanced text2sql workflows with schema linking and well organized prompt engineering
Data Visualization: Generate intuitive data visualizations (via plotly)
Data Catalog Management: Automatically discovers and indexes database table structures, supports flexible catalog storage backends with vector-based or BM25-based retrieval, and easily maintains business explanations for tables and columns as well as optimizes Prompts.
Time Series Forecasting: Forecasting models deployed in-house that can be called as tools
Code Execution: Execute Python code for data analysis and visualization
Interactive Problem-Solving: Proactively ask users for more context when information is incomplete
Persistent Memory: Conversation management and user characteristic memory based on LangGraph checkpointing
MCP Support: Integration with MCP tools by configuration
Knowledge Base Integration: Answer complex questions by combining catalog based knowledge retrival and external knowledge base retrival (via MCP tools)
Web UI Interface: Provide 2 sample UI: simple and streaming web interfaces using Gradio and Streamlit, easy to integrate with other web applications

Roadmap

Anomaly Detection Algorithm: Time series anomaly detection
Root Cause Analysis Algorithm: Multi-dimensional drill-down capabilities for anomaly investigation

Getting started

Installation & Setup

Prerequisites

Python 3.11 or higher
Access to a supported LLM provider (OpenAI, Anthropic, etc.)
Data Warehouse (Database) credentials (like Presto, PostgreSQL, MySQL, etc.)
(Optional) Embedding model for vector-based retrieval - if not available, BM25-based retrieval will be used
(Optional) Docker - required only for docker executor mode

Note on Chinese Text Segmentation: For better Chinese text retrieval, jieba is used for word segmentation. However, jieba is not compatible with Python 3.12+. On Python 3.12 and higher, the system automatically falls back to simple punctuation-based segmentation for Chinese text.

Installation

Using uv (recommended):

git clone git@github.com:zhongyu09/openchatbi
uv sync

Using pip:

pip install openchatbi

For development:

git clone git@github.com:zhongyu09/openchatbi
uv sync --group dev

Optional: If you want to use pysqlite3 (newer SQLite builds), you can install it manually. If build fails, install SQLite first:

On macOS, try to install sqlite using Homebrew:

brew install sqlite
brew info sqlite
export LDFLAGS="-L/opt/homebrew/opt/sqlite/lib"
export CPPFLAGS="-I/opt/homebrew/opt/sqlite/include"

On Amazon Linux / RHEL / CentOS:

sudo yum install sqlite-devel

On Ubuntu / Debian:

sudo apt-get update
sudo apt-get install libsqlite3-dev

Run Demo

Run demo using example dataset from spider dataset. You need to provide "YOUR OPENAI API KEY" or change config to use other LLM providers.

Note: The demo example includes embedding model configuration. If you want to run without an embedding model, you can remove the embedding_model section in the config - BM25 retrieval will be used automatically.

cp example/config.yaml openchatbi/config.yaml
sed -i 's/YOUR_API_KEY_HERE/[YOUR OPENAI API KEY]/g' openchatbi/config.yaml
python run_streamlit_ui.py

Configuration

Create configuration file

Copy the configuration template:

cp openchatbi/config.yaml.template openchatbi/config.yaml

Or create an empty YAML file.

Configure your LLMs:

# Select which provider to use
default_llm: openai

# Define one or more providers
llm_providers:
  openai:
    default_llm:
      class: langchain_openai.ChatOpenAI
      params:
        api_key: YOUR_API_KEY_HERE
        model: gpt-4.1
        temperature: 0.02
        max_tokens: 8192

    # Optional: Embedding model for vector-based retrieval and memory tools
    # If not configured, BM25-based retrieval will be used, and the memory tools will not work
    embedding_model:
      class: langchain_openai.OpenAIEmbeddings
      params:
        api_key: YOUR_API_KEY_HERE
        model: text-embedding-3-large
        chunk_size: 1024

Configure your data warehouse:

organization: Your Company
dialect: presto
data_warehouse_config:
  uri: "presto://user@host:8080/catalog/schema"
  include_tables:
    - your_table_name
  database_name: "catalog.schema"

Running the Application

Invoking LangGraph:

export CONFIG_FILE=YOUR_CONFIG_FILE_PATH

from openchatbi import get_default_graph

graph = get_default_graph()
graph.invoke({"messages": [{"role": "user", "content": "Show me ctr trends for the past 7 days"}]},
    config={"configurable": {"thread_id": "1"}})

# System-generated SQL
SELECT date, SUM(clicks)/SUM(impression) AS ctr
FROM ad_performance
WHERE date >= CURRENT_DATE - 7 DAYS
GROUP BY date
ORDER BY date;

Sample Web UI:

Streamlit based UI:

streamlit run sample_ui streamlit_ui.py

Run Gradio based UI:

python sample_ui/streaming_ui.py

Configuration Instructions

The configuration template is provided at config.yaml.template. Key configuration sections include:

Basic Settings

organization: Organization name (e.g., "Your Company")
dialect: Database dialect (e.g., "presto")
bi_config_file: Path to BI configuration file (e.g., "example/bi.yaml")

Catalog Store Configuration

catalog_store: Configuration for data catalog storage
- store_type: Storage type (e.g., "file_system")
- data_path: Path to catalog data stored by file system (e.g., "./example")

Data Warehouse Configuration

data_warehouse_config: Database connection settings
- uri: Connection string for your database
- include_tables: List of tables to include in catalog, leave empty to include all tables
- database_name: Database name for catalog
- token_service: Token service URL (for data warehouse that need token authentication like Presto)
- user_name / password: Token service credentials

LLM Configuration

Various LLMs are supported based on LangChain, see LangChain API Document(https://python.langchain.com/api_reference/reference.html#integrations) for full list that support chat_models. You can configure different LLMs for different tasks:

default_llm: Primary language model for general tasks
embedding_model: (Optional) Model for embedding generation. If not configured, BM25-based text retrieval will be used as fallback, and the memory tools will not work
text2sql_llm: (Optional) Specialized model for SQL generation. If not configured, uses default_llm

Multiple providers (optional):

Configure multiple providers under llm_providers and select with default_llm: <provider_name>.
In sample_ui/streamlit_ui.py, a provider dropdown appears when llm_providers is configured.
In sample_api/async_api.py, pass provider in the /chat/stream request body.

Commonly used LLM providers and their corresponding classes and installation commands:

Anthropic: langchain_anthropic.ChatAnthropic, pip install langchain-anthropic
OpenAI: langchain_openai.ChatOpenAI, pip install langchain-openai
Azure OpenAI: langchain_openai.AzureChatOpenAI, pip install langchain-openai
Google Vertex AI: langchain_google_vertexai.ChatVertexAI, pip install langchain-google-vertexai
Bedrock: langchain_aws.ChatBedrock, pip install langchain-aws
Huggingface: langchain_huggingface.ChatHuggingFace, pip install langchain-huggingface
Deepseek: langchain_deepseek.ChatDeepSeek, pip install langchain-deepseek
Ollama: langchain_ollama.ChatOllama, pip install langchain-ollama

Advanced Configuration

OpenChatBI supports sophisticated customization through prompt engineering and catalog management features:

Prompt Engineering Configuration: Customize system prompts, business glossaries, and data warehouse introductions
Data Catalog Management: Configure table metadata, column descriptions, and SQL generation rules
Business Rules: Define table selection criteria and domain-specific SQL constraints
Forecasting Service: Configure the forecasting service url and prompt based on your own deployment

For detailed configuration options and examples, see the Advanced Features section.

Architecture Overview

OpenChatBI is built using a modular architecture with clear separation of concerns:

LangGraph Workflows: Core orchestration using state machines for complex multi-step processes
Catalog Management: Flexible data catalog system with intelligent retrieval (vector-based or BM25 fallback)
Text2SQL Pipeline: Advanced natural language to SQL conversion with schema linking
Code Execution: Sandboxed Python execution environment for data analysis
Tool Integration: Extensible tool system for human interaction and knowledge search
**Persistent Memory

Openchatbi

Install / Use

README