BambooAI
A Python library powered by Language Models (LLMs) for conversational data discovery and analysis.
Install / Use
/learn @pgalko/BambooAIQuality Score
Category
Development & EngineeringSupported Platforms
README
BambooAI
<img width="100" alt="BambooAI Logo" src="images/logo.png" />https://bambooai.org
BambooAI is an open-source library that enables natural language-based data analysis using Large Language Models (LLMs). It works with both local datasets and can fetch data from external sources and APIs.
Table of Contents
- Overview
- Features
- Demo Videos
- Installation
- Quick Start
- How It Works
- Configuration
- Auxiliary Datasets
- Dataframe Ontology (Semantic Memory)
- Vector DB (Episodic Memory)
- Usage Examples
- Web Application Setup
- Model Support
- Environment Variables
- Logging
- Performance Comparison
- Contributing
Overview
BambooAI is an experimental tool that makes data analysis more accessible by allowing users to interact with their data through natural language conversations. It's designed to:
- Process natural language queries about datasets
- Generate and execute Python code for analysis and visualization
- Help users derive insights without extensive coding knowledge
- Augment capabilities of data analysts at all levels
- Streamline data analysis workflows
Features
- Natural language interface for data analysis
- Web UI and Jupyter notebook support
- Support for local and external datasets
- Integration with internet searches and external APIs
- User feedback during stream
- Optional planning agent for complex tasks
- Integration of custom ontologies
- Code generation for data analysis and visualization
- Self healing/error correction
- Custom code edits and code execution
- Knowledge base integration via vector database
- Workflows saving and follow ups
- In-context and multimodal queries
Demo Videos
Machine Learning Example (Jupyter Notebook)
A demonstration of creating a machine learning model to predict Titanic passenger survival:
https://github.com/user-attachments/assets/59ef810c-80d8-4ef1-8edf-82ba64178b85
Sports Data Analysis (Web UI)
Example of various sports data analysis queries:
https://github.com/user-attachments/assets/7b9c9cd6-56e3-46ee-a6c6-c32324a0c5ef
Installation
pip install bambooai
Or alternatively clone the repo and install the requirements
git clone https://github.com/pgalko/BambooAI.git
pip install -r requirements.txt
Quick Start
Try it out on a basic example in Google Colab:
Basic Example
-
Install BambooAI:
pip install bambooai -
Configure environment:
cp .env.example .env # Edit .env with your settings -
Configure agents/models
cp LLM_CONFIG_sample.json LLM_CONFIG.json # Edit LLM_CONFIG.json with your desired combination of agents, models and parameters -
Run
import pandas as pd from bambooai import BambooAI import plotly.io as pio pio.renderers.default = 'jupyterlab' df = pd.read_csv('titanic.csv') bamboo = BambooAI(df=df, planning=True, vector_db=False, search_tool=True) bamboo.pd_agent_converse()
How It Works
The BambooAI operates through six key steps:
-
Initiation
- Launches with a user question or prompt for one
- Continues in a conversation loop until exit
-
Task Routing
- Classifies questions using LLM
- Routes to appropriate handler (text response or code generation)
-
User Feedback
- If the instruction is vague or unclear the model will pause and ask user for feedback
- If the model encounters any ambiguity during the solving process, it will pause and ask for direction offering a few options
-
Dynamic Prompt Build
- Evaluates data requirements
- Asks for feedback or uses tools if more context is needed
- Formulates analysis plan
- Performs semantic search for similar questions
- Generates code using selected LLM
-
Debugging and Execution
- Executes generated code
- Handles errors with LLM-based correction
- Retries until successful or limit reached
-
Results and Knowledge Base
- Ranks answers for quality
- Stores high-quality solutions in vector database
- Presents formatted results or visualizations
Flow Chart

Configuration
Parameters
BambooAI accepts the following initialization parameters:
bamboo = BambooAI(
df=None, # DataFrame to analyze
auxiliary_datasets=None, # List of paths to auxiliary datasets
max_conversations=4, # Number of conversation pairs to keep in memory
search_tool=False, # Enable internet search capability
planning=False, # Enable planning agent for complex tasks
webui=False, # Run as web application
vector_db=False, # Enable vector database for knowledge storage
df_ontology=False, # Use custom dataframe ontology
exploratory=True, # Enable expert selection for query handling
custom_prompt_file=None # Enable the use of custom/modified prompt templates
)
Detailed Parameter Descriptions:
-
df(pd.DataFrame, optional)- Input dataframe for analysis
- If not provided, BambooAI will attempt to source data from the internet or auxiliary datasets
-
auxiliary_datasets(list, default=None)- List of paths to auxiliary datasets
- These will be incorporated into the solution as needed, and pulled when the code executes
- These are to complement the main dataframe
-
max_conversations(int, default=4)- Number of user-assistant conversation pairs to maintain in context
- Affects context window and token usage
-
search_tool(bool, default=False)- Enables internet search capabilities
- Requires appropriate API keys when enabled
-
planning(bool, default=False)- Enables the Planning agent for complex tasks
- Breaks down tasks into manageable steps
- Improves solution quality for complex queries
-
webui(bool, default=False)- Runs BambooAI as a web application
- Uses Flask API for web interface
-
vector_db(bool, default=False)- Enables vector database for knowledge storage and semantic search
- Stores high-quality solutions for future reference
- Requires Pinecone API key
- Supports two embeddings models
text-embedding-3-small(OpenAI) andall-MiniLM-L6-v2(HF)
-
df_ontology(str, default=None)- Uses custom dataframe ontology for improved understanding
- Requires OWL ontology as a
.ttlfile. The parameter takes the path to the TTL file. - Significantly improves solution quality
-
exploratory(bool, default=True)- Enables expert selection for query handling
- Chooses between Research Specialist and Data Analyst roles
-
custom_prompt_file(str, default=None)- Enables users to provide custom prompt templates
- Requires path to the YAML file containing the templates
Agent and Model Configuration
BambooAI uses multi-agent system where different specialized agents handle specific aspects of the data analysis process. Each agent can be configured to use different LLM models and parameters based on their specific requirements.
Configuration Structure
The LLM configuration is stored in LLM_CONFIG.json. Here's the complete configuration structure:
{
"agent_configs": [
{"agent": "Expert Selector", "details": {"model": "gpt-4.1", "provider":"openai","max_tokens": 2000, "temperature": 0}},
{"agent": "Analyst Selector", "details": {"model": "claude-3-7-sonnet-20250219", "provider":"anthropic","max_tokens": 2000, "temperature": 0}},
{"agent": "Theorist", "details": {"model": "gemini-2.5-pro-preview-03-25", "provider":"gemini","max_tokens": 4000, "temperature": 0}},
{"agent": "Dataframe Inspector", "details": {"model": "gemini-2.0-flash", "provider":"gemini","max_tokens": 8000, "temperature": 0}},
{"agent": "Planner", "details": {"model": "gemini-2.5-pro-preview-03-25", "provider":"gemini","max_tokens": 8000, "temperature": 0}},
{"agent": "Code Generator", "details": {"model": "claude-3-5-sonnet-20241022", "provider":"anthropic","max_tokens": 8000, "temperature": 0}},
{"agent": "Error Corrector", "details": {"model": "claude-3-5-sonnet-20241022", "provider":"anthropic","max_tokens": 8000, "temperature": 0}},
{"agent": "Reviewer", "details": {"model": "gemini-2.5-pro-preview-03-25", "provider":"gemini","max_tokens": 8000, "temperature": 0}},
{"agent": "Solution Summarizer", "details": {"model": "gemini-2.5-flash-preview-04-17", "provider":"gemini","max_tokens": 4000, "temperature": 0}},
{"agent": "Google Search Executor", "details": {"model": "gemini-2.5-flash-preview-04-17", "provider":"gemini","max_tokens": 4000, "temperature": 0}},
{"agent": "Google Search Summarizer", "details": {"model": "gemini-2.5-flash-preview-04-17", "provider":"gemini","max_tokens": 4000, "temperature": 0}}
],
"model_properties": {
"gpt-4o": {"capability":"base","multimodal":"true", "templ_formating":"text", "prompt_tokens": 0.0025, "completion_tokens": 0.010},
"gpt-4.1": {"capability":"base","multimodal":"true", "templ_formating":"text", "prompt_tokens": 0.002, "completion_tokens": 0.008},
"gpt-4o-mini": {"capability":"base", "mult
