SkillAgentSearch skills...

Openchatbi

OpenChatBI is an intelligent chat-based BI tool powered by large language models, designed to help users query, analyze, and visualize data through natural language conversations. It uses LangGraph and LangChain to build chat agent and workflows that support natural language to SQL conversion and data analysis.

Install / Use

/learn @zhongyu09/Openchatbi
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

OpenChatBI

OpenChatBI is an open source, chat-based intelligent BI tool powered by large language models, designed to help users query, analyze, and visualize data through natural language conversations. Built on LangGraph and LangChain ecosystem, it provides chat agents and workflows that support natural language to SQL conversion and streamlined data analysis.

Join the Slack channel to discuss: https://join.slack.com/t/openchatbicommunity/shared_invite/zt-3jpzpx9mv-Sk88RxpO4Up0L~YTZYf4GQ

<img src="https://github.com/zhongyu09/openchatbi/raw/main/example/demo.gif" alt="Demo" width="800">

Core Features

  1. Natural Language Interaction: Get data analysis results by asking questions in natural language
  2. Automatic SQL Generation: Convert natural language queries into SQL statements using advanced text2sql workflows with schema linking and well organized prompt engineering
  3. Data Visualization: Generate intuitive data visualizations (via plotly)
  4. Data Catalog Management: Automatically discovers and indexes database table structures, supports flexible catalog storage backends with vector-based or BM25-based retrieval, and easily maintains business explanations for tables and columns as well as optimizes Prompts.
  5. Time Series Forecasting: Forecasting models deployed in-house that can be called as tools
  6. Code Execution: Execute Python code for data analysis and visualization
  7. Interactive Problem-Solving: Proactively ask users for more context when information is incomplete
  8. Persistent Memory: Conversation management and user characteristic memory based on LangGraph checkpointing
  9. MCP Support: Integration with MCP tools by configuration
  10. Knowledge Base Integration: Answer complex questions by combining catalog based knowledge retrival and external knowledge base retrival (via MCP tools)
  11. Web UI Interface: Provide 2 sample UI: simple and streaming web interfaces using Gradio and Streamlit, easy to integrate with other web applications

Roadmap

  1. Anomaly Detection Algorithm: Time series anomaly detection
  2. Root Cause Analysis Algorithm: Multi-dimensional drill-down capabilities for anomaly investigation

Getting started

Installation & Setup

Prerequisites

  • Python 3.11 or higher
  • Access to a supported LLM provider (OpenAI, Anthropic, etc.)
  • Data Warehouse (Database) credentials (like Presto, PostgreSQL, MySQL, etc.)
  • (Optional) Embedding model for vector-based retrieval - if not available, BM25-based retrieval will be used
  • (Optional) Docker - required only for docker executor mode

Note on Chinese Text Segmentation: For better Chinese text retrieval, jieba is used for word segmentation. However, jieba is not compatible with Python 3.12+. On Python 3.12 and higher, the system automatically falls back to simple punctuation-based segmentation for Chinese text.

Installation

  1. Using uv (recommended):
git clone git@github.com:zhongyu09/openchatbi
uv sync
  1. Using pip:
pip install openchatbi
  1. For development:
git clone git@github.com:zhongyu09/openchatbi
uv sync --group dev

Optional: If you want to use pysqlite3 (newer SQLite builds), you can install it manually. If build fails, install SQLite first:

On macOS, try to install sqlite using Homebrew:

brew install sqlite
brew info sqlite
export LDFLAGS="-L/opt/homebrew/opt/sqlite/lib"
export CPPFLAGS="-I/opt/homebrew/opt/sqlite/include"

On Amazon Linux / RHEL / CentOS:

sudo yum install sqlite-devel

On Ubuntu / Debian:

sudo apt-get update
sudo apt-get install libsqlite3-dev

Run Demo

Run demo using example dataset from spider dataset. You need to provide "YOUR OPENAI API KEY" or change config to use other LLM providers.

Note: The demo example includes embedding model configuration. If you want to run without an embedding model, you can remove the embedding_model section in the config - BM25 retrieval will be used automatically.

cp example/config.yaml openchatbi/config.yaml
sed -i 's/YOUR_API_KEY_HERE/[YOUR OPENAI API KEY]/g' openchatbi/config.yaml
python run_streamlit_ui.py

Configuration

  1. Create configuration file

Copy the configuration template:

cp openchatbi/config.yaml.template openchatbi/config.yaml

Or create an empty YAML file.

  1. Configure your LLMs:
# Select which provider to use
default_llm: openai

# Define one or more providers
llm_providers:
  openai:
    default_llm:
      class: langchain_openai.ChatOpenAI
      params:
        api_key: YOUR_API_KEY_HERE
        model: gpt-4.1
        temperature: 0.02
        max_tokens: 8192

    # Optional: Embedding model for vector-based retrieval and memory tools
    # If not configured, BM25-based retrieval will be used, and the memory tools will not work
    embedding_model:
      class: langchain_openai.OpenAIEmbeddings
      params:
        api_key: YOUR_API_KEY_HERE
        model: text-embedding-3-large
        chunk_size: 1024
  1. Configure your data warehouse:
organization: Your Company
dialect: presto
data_warehouse_config:
  uri: "presto://user@host:8080/catalog/schema"
  include_tables:
    - your_table_name
  database_name: "catalog.schema"

Running the Application

  1. Invoking LangGraph:
export CONFIG_FILE=YOUR_CONFIG_FILE_PATH
from openchatbi import get_default_graph

graph = get_default_graph()
graph.invoke({"messages": [{"role": "user", "content": "Show me ctr trends for the past 7 days"}]},
    config={"configurable": {"thread_id": "1"}})
# System-generated SQL
SELECT date, SUM(clicks)/SUM(impression) AS ctr
FROM ad_performance
WHERE date >= CURRENT_DATE - 7 DAYS
GROUP BY date
ORDER BY date;
  1. Sample Web UI:

Streamlit based UI:

streamlit run sample_ui streamlit_ui.py

Run Gradio based UI:

python sample_ui/streaming_ui.py

Configuration Instructions

The configuration template is provided at config.yaml.template. Key configuration sections include:

Basic Settings

  • organization: Organization name (e.g., "Your Company")
  • dialect: Database dialect (e.g., "presto")
  • bi_config_file: Path to BI configuration file (e.g., "example/bi.yaml")

Catalog Store Configuration

  • catalog_store: Configuration for data catalog storage
    • store_type: Storage type (e.g., "file_system")
    • data_path: Path to catalog data stored by file system (e.g., "./example")

Data Warehouse Configuration

  • data_warehouse_config: Database connection settings
    • uri: Connection string for your database
    • include_tables: List of tables to include in catalog, leave empty to include all tables
    • database_name: Database name for catalog
    • token_service: Token service URL (for data warehouse that need token authentication like Presto)
    • user_name / password: Token service credentials

LLM Configuration

Various LLMs are supported based on LangChain, see LangChain API Document(https://python.langchain.com/api_reference/reference.html#integrations) for full list that support chat_models. You can configure different LLMs for different tasks:

  • default_llm: Primary language model for general tasks
  • embedding_model: (Optional) Model for embedding generation. If not configured, BM25-based text retrieval will be used as fallback, and the memory tools will not work
  • text2sql_llm: (Optional) Specialized model for SQL generation. If not configured, uses default_llm

Multiple providers (optional):

  • Configure multiple providers under llm_providers and select with default_llm: <provider_name>.
  • In sample_ui/streamlit_ui.py, a provider dropdown appears when llm_providers is configured.
  • In sample_api/async_api.py, pass provider in the /chat/stream request body.

Commonly used LLM providers and their corresponding classes and installation commands:

  • Anthropic: langchain_anthropic.ChatAnthropic, pip install langchain-anthropic
  • OpenAI: langchain_openai.ChatOpenAI, pip install langchain-openai
  • Azure OpenAI: langchain_openai.AzureChatOpenAI, pip install langchain-openai
  • Google Vertex AI: langchain_google_vertexai.ChatVertexAI, pip install langchain-google-vertexai
  • Bedrock: langchain_aws.ChatBedrock, pip install langchain-aws
  • Huggingface: langchain_huggingface.ChatHuggingFace, pip install langchain-huggingface
  • Deepseek: langchain_deepseek.ChatDeepSeek, pip install langchain-deepseek
  • Ollama: langchain_ollama.ChatOllama, pip install langchain-ollama

Advanced Configuration

OpenChatBI supports sophisticated customization through prompt engineering and catalog management features:

  • Prompt Engineering Configuration: Customize system prompts, business glossaries, and data warehouse introductions
  • Data Catalog Management: Configure table metadata, column descriptions, and SQL generation rules
  • Business Rules: Define table selection criteria and domain-specific SQL constraints
  • Forecasting Service: Configure the forecasting service url and prompt based on your own deployment

For detailed configuration options and examples, see the Advanced Features section.

Architecture Overview

OpenChatBI is built using a modular architecture with clear separation of concerns:

  1. LangGraph Workflows: Core orchestration using state machines for complex multi-step processes
  2. Catalog Management: Flexible data catalog system with intelligent retrieval (vector-based or BM25 fallback)
  3. Text2SQL Pipeline: Advanced natural language to SQL conversion with schema linking
  4. Code Execution: Sandboxed Python execution environment for data analysis
  5. Tool Integration: Extensible tool system for human interaction and knowledge search
  6. **Persistent Memory
View on GitHub
GitHub Stars535
CategoryData
Updated7h ago
Forks67

Languages

Python

Security Score

100/100

Audited on Mar 23, 2026

No findings