Dataneuron
Data Neuron is a powerful framework that enables you to build text-to-SQL applications with an easily maintainable semantic layer. Whether you're creating customer-facing chatbots, internal Slack bots for analytics, or other data-driven applications, Data Neuron provides the tools to make your data accessible through natural language
Install / Use
/learn @databrainhq/DataneuronREADME
Data Neuron
Data Neuron is a simple framework for you to chat with your data in natural language through python sdk, REST API or CLI directly and with an easy to maintain and continually improving semantic layer in the form yml files.
⭐ If you find DataNeuron useful, please consider giving us a star on GitHub! Your support helps us continue to innovate and deliver exciting features
Quick Start
- Install the library using
pip. - Choose your specific set of tables and label them with alias, description, business specific glossary/definitions, client/tenant tables(if it is for your end users).
- Chat in cli using
dnn --chat <contextname>to test and validate how LLM performs - Once your semantic layer is ready, start integrating within your existing python app through our sdk like
from dataneuron import DataNeuronand build internal slack app or build customer facing chatbot or email reports. Or deploy your semantic layer + dataneuron as an API endpoint to AWS lambda or VPS machine.
Currently supports SQLite, PostgreSQL, MySQL, MSSQL, CSV files(through duckdb), Clickhouse. Works with major LLMs like Claude (default), OpenAI, LLAMA etc(through groq, nvidia, ..), OLLAMA.
Quick Usage
- Install
pip install dataneuron[mssql, pdf] - Set he LLM key in an enviornment variable in your system, by default it uses claude
CLAUDE_API_KEY - Initialize database config:
dnn --db-init <database_type> - Generate context:
dnn --init - Start chat mode:
dnn --chat <contextname>and save it as metrics to dashboards locally as yaml files. - Get html pdf reports for your dashboard:
dnn --report - Run the API server to access chat, reports, dashboards, metrics:
dnn --server(See API section for more) - Deploy the server through AWS lambda or traditional VPS machine.
- If you have an existing Django or Flask or python project you can use the DataNeuron class from the pacakge directly like shown below:
from dataneuron import DataNeuron
# Initialize DataNeuron
dn = DataNeuron(db_config='database.yaml', context='your_context_name')
dn.initialize()
dn.set_client_context("userid") # optional: if you want to make it scoped specific to your customer/tenant
# Ask a question
question = "How many users do we have?"
result = dn.query(question)
print(f"SQL Query: {result['sql']}")
print(f"Result: {result['result']}")
Base setup
https://github.com/user-attachments/assets/2108cce7-c48c-4a45-b1c6-f7bde71c635c
Reports pdf
https://github.com/user-attachments/assets/de71a220-4bd9-4f53-b245-064fcaca85bb
As API
https://github.com/user-attachments/assets/0fd477cd-ef8b-44ed-993a-b1ad16cfd82a
https://github.com/user-attachments/assets/8d363c0a-e12a-47ff-b4e4-e5f0bf302224
Features
- Support for multiple database types (SQLite, PostgreSQL, MySQL, MSSQL, CSV files(through duckdb), Clickhouse)
- Natural language to SQL query conversion
- Interactive chat mode for continuous database querying
- Multiple context management, you can create and manage multiple contexts for your customer_succes team, product team etc
- Automatic context generation from database schema
- Customizable context for improved query accuracy
- Support for various LLM providers (Claude, OpenAI, Azure, Custom, Ollama)
- Optimized for smaller database subsets (up to 10-15 tables)
- API server that can be deployed to AWS lambda or traditional server that support python flask.
- Through API you can list the /dashboards, metrics and also query individual metric and also chat with your context and generate feature rich HTML report
Installation
Data Neuron can be installed with different database support options:
-
Base package (SQLite support only):
pip install dataneuron -
With PostgreSQL support:
pip install dataneuron[postgres] -
With MySQL support:
pip install dataneuron[mysql] -
With MSSQL support:
pip install dataneuron[mssql] -
With all database supports:
pip install dataneuron[all] -
With CSV support:
pip install dataneuron[csv] -
With Clickhouse support:
pip install dataneuron[clickhouse]
Note: if you use zsh, you might have to use quotes around the package name like. For csv right now it doesn't support nested folder structure just a folder with csv files, each csv will be treated as a table.
pip install "dataneuron[mysql]"
If you wanted the report generation with pdf in your cli, you have to include pdf along with your db as extra dependencies.
pip install "dataneuron[mssql,pdf]"
Quick Start
-
Initialize database configuration:
dnn --db-init <database_type>Replace
<database_type>with sqlite, mysql, mssql, or postgres.This will create a database.yaml that will be used by the framework to later connect with your db.
-
Generate context from your database:
dnn --initThis will prompt for a context name, you can give
product_analyticsorcustomer_successor any and it will then create YAML files in thecontext/<contextname>directory which will be your semantic layer for your data. You will be told to select couple of tables, so that it can be auto-labelled which you can edit later. -
Or start an interactive chat session:
dnn --chat <context_name>eg:
dnn --chat product_analyticsYou can chat with the semantic layer that you have created. And you will also be able to save the metric to a dashboard, this will get created under
dashboards/<dashname>.yml -
You can generate reports with image as input for your dashboards. You need to have
wkhtmltopdfin your system. For mac
brew install wkhtmltopdf
And then you need to install the dataneuron package with that dependency
pip install dataneuron[postgres, pdf]
Assuming you wanted both postgres and pdf.
dnn --report
Configuration
Data Neuron supports various LLM providers. Set the following environment variables based on your chosen provider:
Claude (Default)
CLAUDE_API_KEY=your_claude_api_key_here
OpenAI
DATA_NEURON_LLM=openai
OPENAI_API_KEY=your_openai_api_key_here
OPENAI_MODEL=gpt-4 # Optional, defaults to gpt-4o
Azure OpenAI
DATA_NEURON_LLM=azure
AZURE_OPENAI_API_KEY=your_azure_api_key_here
AZURE_OPENAI_API_VERSION=your_api_version_here
AZURE_OPENAI_ENDPOINT=your_azure_endpoint_here
AZURE_OPENAI_DEPLOYMENT_NAME=your_deployment_name_here
Custom Provider
DATA_NEURON_LLM=custom
DATA_NEURON_LLM_API_KEY=your_custom_api_key_here
DATA_NEURON_LLM_ENDPOINT=your_custom_endpoint_here
DATA_NEURON_LLM_MODEL=your_preferred_model_here
Ollama (for local LLM models)
Note: Doesn't generate good set of results.
DATA_NEURON_LLM=ollama
DATA_NEURON_LLM_MODEL=your_preferred_local_model_here
Data Neuro package:
Basic Usage
Here's a simple example of how to use DataNeuron:
from dataneuron import DataNeuron
# Initialize DataNeuron
dn = DataNeuron(db_config='database.yaml', context='your_context_name')
dn.initialize()
# Ask a question
question = "How many users do we have?"
result = dn.query(question)
print(f"SQL Query: {result['sql']}")
print(f"Result: {result['result']}")
Key Features
1. Initialization
The DataNeuron class needs to be initialized with a database configuration and a context:
dn = DataNeuron(db_config='database.yaml', context='your_context_name', log=True)
dn.initialize()
db_config: Path to your database configuration file or a dictionary with configuration details.context: Name of the context (semantic layer) you want to use or a dictionary with context details.log: Boolean to enable or disable logging (default is False).
2. Querying
You can use the query method to ask questions in natural language:
result = dn.query("What are the top 5 products by sales?")
The result dictionary contains:
original_question: The question you asked.refined_question: The question after refinement by the system.sql: The generated SQL query.result: The query results.explanation: An explanation of the query and results.
3. Chat Functionality
DataNeuron supports a chat-like interaction:
sql, response = dn.chat("Who are our top customers?")
print(f"SQL: {sql}")
print(f"Response: {response}")
The chat method maintains a conversation history, allowing for context-aware follow-up questions.
4. Direct SQL Execution
You can execute SQL queries directly:
result = dn.execute_query("SELECT * FROM users LIMIT 5")
5. Database Information
Retrieve information about your database:
tables = dn.get_table_list()
table_info = dn.get_table_info("users")
6. Client/Tenant scoped queries/chat
First mark the client column in tables (important step). This will create a client_info.yaml that will be used for lookup later for filtering the queries.
dnn --mc
Set the client context before querying or chatting
dn = DataNeuron(db_config='database.yaml', context='your_context')
dn.initialize()
dn.set_client_context(client_id)
result = dn.query("Your query here")
Every query that is generated will be filtered with client_id column based on the client column tables
that you had given earlier using dnn --mc, you can manually edit that file as well.
Important Note on Limitations (WIP):
- Currently this client specific filter works on tables with client_id. For eg, if there is a
scenario where you ask "My order items" but order_items table doesn't have client_id but the
orderstable, this won't add "JOIN" automatically yet. - Similarly this won't work with Recursive CTE.
NOTE: All yaml files can be edited as long as the base structure is preserved, you can add any new columns to tables yaml or definitions yaml, the st
