BambooAI

https://bambooai.org

BambooAI is an open-source library that enables natural language-based data analysis using Large Language Models (LLMs). It works with both local datasets and can fetch data from external sources and APIs.

Overview
Features
Demo Videos
Installation
Quick Start
How It Works
Configuration
- Parameters
- Agent and Model Configuration
Auxiliary Datasets
Dataframe Ontology (Semantic Memory)
Vector DB (Episodic Memory)
Usage Examples
Web Application Setup
Model Support
Environment Variables
Logging
Performance Comparison
Contributing

Overview

BambooAI is an experimental tool that makes data analysis more accessible by allowing users to interact with their data through natural language conversations. It's designed to:

Process natural language queries about datasets
Generate and execute Python code for analysis and visualization
Help users derive insights without extensive coding knowledge
Augment capabilities of data analysts at all levels
Streamline data analysis workflows

Features

Natural language interface for data analysis
Web UI and Jupyter notebook support
Support for local and external datasets
Integration with internet searches and external APIs
User feedback during stream
Optional planning agent for complex tasks
Integration of custom ontologies
Code generation for data analysis and visualization
Self healing/error correction
Custom code edits and code execution
Knowledge base integration via vector database
Workflows saving and follow ups
In-context and multimodal queries

Demo Videos

Machine Learning Example (Jupyter Notebook)

A demonstration of creating a machine learning model to predict Titanic passenger survival:

https://github.com/user-attachments/assets/59ef810c-80d8-4ef1-8edf-82ba64178b85

Sports Data Analysis (Web UI)

Example of various sports data analysis queries:

https://github.com/user-attachments/assets/7b9c9cd6-56e3-46ee-a6c6-c32324a0c5ef

Installation

pip install bambooai

Or alternatively clone the repo and install the requirements

git clone https://github.com/pgalko/BambooAI.git
pip install -r requirements.txt

Quick Start

Try it out on a basic example in Google Colab:

Basic Example

Install BambooAI:
```
pip install bambooai
```

Configure environment:

cp .env.example .env
# Edit .env with your settings

Configure agents/models

cp LLM_CONFIG_sample.json LLM_CONFIG.json
# Edit LLM_CONFIG.json with your desired combination of agents, models and parameters

Run

import pandas as pd
from bambooai import BambooAI

import plotly.io as pio
pio.renderers.default = 'jupyterlab'

df = pd.read_csv('titanic.csv')
bamboo = BambooAI(df=df, planning=True, vector_db=False, search_tool=True)
bamboo.pd_agent_converse()

How It Works

The BambooAI operates through six key steps:

Initiation
- Launches with a user question or prompt for one
- Continues in a conversation loop until exit
Task Routing
- Classifies questions using LLM
- Routes to appropriate handler (text response or code generation)
User Feedback
- If the instruction is vague or unclear the model will pause and ask user for feedback
- If the model encounters any ambiguity during the solving process, it will pause and ask for direction offering a few options
Dynamic Prompt Build
- Evaluates data requirements
- Asks for feedback or uses tools if more context is needed
- Formulates analysis plan
- Performs semantic search for similar questions
- Generates code using selected LLM
Debugging and Execution
- Executes generated code
- Handles errors with LLM-based correction
- Retries until successful or limit reached
Results and Knowledge Base
- Ranks answers for quality
- Stores high-quality solutions in vector database
- Presents formatted results or visualizations

Flow Chart

Configuration

Parameters

BambooAI accepts the following initialization parameters:

bamboo = BambooAI(
    df=None,                    # DataFrame to analyze
    auxiliary_datasets=None,    # List of paths to auxiliary datasets
    max_conversations=4,        # Number of conversation pairs to keep in memory
    search_tool=False,          # Enable internet search capability
    planning=False,             # Enable planning agent for complex tasks
    webui=False,                # Run as web application
    vector_db=False,            # Enable vector database for knowledge storage
    df_ontology=False,          # Use custom dataframe ontology
    exploratory=True,           # Enable expert selection for query handling
    custom_prompt_file=None     # Enable the use of custom/modified prompt templates
)

Detailed Parameter Descriptions:

df (pd.DataFrame, optional)
- Input dataframe for analysis
- If not provided, BambooAI will attempt to source data from the internet or auxiliary datasets
auxiliary_datasets (list, default=None)
- List of paths to auxiliary datasets
- These will be incorporated into the solution as needed, and pulled when the code executes
- These are to complement the main dataframe
max_conversations (int, default=4)
- Number of user-assistant conversation pairs to maintain in context
- Affects context window and token usage
search_tool (bool, default=False)
- Enables internet search capabilities
- Requires appropriate API keys when enabled
planning (bool, default=False)
- Enables the Planning agent for complex tasks
- Breaks down tasks into manageable steps
- Improves solution quality for complex queries
webui (bool, default=False)
- Runs BambooAI as a web application
- Uses Flask API for web interface
vector_db (bool, default=False)
- Enables vector database for knowledge storage and semantic search
- Stores high-quality solutions for future reference
- Requires Pinecone API key
- Supports two embeddings models text-embedding-3-small(OpenAI) and all-MiniLM-L6-v2(HF)
df_ontology (str, default=None)
- Uses custom dataframe ontology for improved understanding
- Requires OWL ontology as a .ttl file. The parameter takes the path to the TTL file.
- Significantly improves solution quality
exploratory (bool, default=True)
- Enables expert selection for query handling
- Chooses between Research Specialist and Data Analyst roles
custom_prompt_file (str, default=None)
- Enables users to provide custom prompt templates
- Requires path to the YAML file containing the templates

Agent and Model Configuration

BambooAI uses multi-agent system where different specialized agents handle specific aspects of the data analysis process. Each agent can be configured to use different LLM models and parameters based on their specific requirements.

Configuration Structure

The LLM configuration is stored in LLM_CONFIG.json. Here's the complete configuration structure:

{
  "agent_configs": [
    {"agent": "Expert Selector", "details": {"model": "gpt-4.1", "provider":"openai","max_tokens": 2000, "temperature": 0}},
    {"agent": "Analyst Selector", "details": {"model": "claude-3-7-sonnet-20250219", "provider":"anthropic","max_tokens": 2000, "temperature": 0}},
    {"agent": "Theorist", "details": {"model": "gemini-2.5-pro-preview-03-25", "provider":"gemini","max_tokens": 4000, "temperature": 0}},
    {"agent": "Dataframe Inspector", "details": {"model": "gemini-2.0-flash", "provider":"gemini","max_tokens": 8000, "temperature": 0}},
    {"agent": "Planner", "details": {"model": "gemini-2.5-pro-preview-03-25", "provider":"gemini","max_tokens": 8000, "temperature": 0}},
    {"agent": "Code Generator", "details": {"model": "claude-3-5-sonnet-20241022", "provider":"anthropic","max_tokens": 8000, "temperature": 0}},
    {"agent": "Error Corrector", "details": {"model": "claude-3-5-sonnet-20241022", "provider":"anthropic","max_tokens": 8000, "temperature": 0}},
    {"agent": "Reviewer", "details": {"model": "gemini-2.5-pro-preview-03-25", "provider":"gemini","max_tokens": 8000, "temperature": 0}},
    {"agent": "Solution Summarizer", "details": {"model": "gemini-2.5-flash-preview-04-17", "provider":"gemini","max_tokens": 4000, "temperature": 0}},
    {"agent": "Google Search Executor", "details": {"model": "gemini-2.5-flash-preview-04-17", "provider":"gemini","max_tokens": 4000, "temperature": 0}},
    {"agent": "Google Search Summarizer", "details": {"model": "gemini-2.5-flash-preview-04-17", "provider":"gemini","max_tokens": 4000, "temperature": 0}}
  ],
  "model_properties": {
    "gpt-4o": {"capability":"base","multimodal":"true", "templ_formating":"text", "prompt_tokens": 0.0025, "completion_tokens": 0.010},
    "gpt-4.1": {"capability":"base","multimodal":"true", "templ_formating":"text", "prompt_tokens": 0.002, "completion_tokens": 0.008},
    "gpt-4o-mini": {"capability":"base", "mult

BambooAI

Install / Use

README

BambooAI

Table of Contents

Overview

Features

Demo Videos

Machine Learning Example (Jupyter Notebook)

Sports Data Analysis (Web UI)

Installation

Quick Start

Basic Example

How It Works

Flow Chart

Configuration

Parameters

Detailed Parameter Descriptions:

Agent and Model Configuration

Configuration Structure