SkillAgentSearch skills...

BambooAI

A Python library powered by Language Models (LLMs) for conversational data discovery and analysis.

Install / Use

/learn @pgalko/BambooAI

README

BambooAI

<img width="100" alt="BambooAI Logo" src="images/logo.png" />

https://bambooai.org

BambooAI is an open-source library that enables natural language-based data analysis using Large Language Models (LLMs). It works with both local datasets and can fetch data from external sources and APIs.

Table of Contents

Overview

BambooAI is an experimental tool that makes data analysis more accessible by allowing users to interact with their data through natural language conversations. It's designed to:

  • Process natural language queries about datasets
  • Generate and execute Python code for analysis and visualization
  • Help users derive insights without extensive coding knowledge
  • Augment capabilities of data analysts at all levels
  • Streamline data analysis workflows

Features

  • Natural language interface for data analysis
  • Web UI and Jupyter notebook support
  • Support for local and external datasets
  • Integration with internet searches and external APIs
  • User feedback during stream
  • Optional planning agent for complex tasks
  • Integration of custom ontologies
  • Code generation for data analysis and visualization
  • Self healing/error correction
  • Custom code edits and code execution
  • Knowledge base integration via vector database
  • Workflows saving and follow ups
  • In-context and multimodal queries

Demo Videos

Machine Learning Example (Jupyter Notebook)

A demonstration of creating a machine learning model to predict Titanic passenger survival:

https://github.com/user-attachments/assets/59ef810c-80d8-4ef1-8edf-82ba64178b85

Sports Data Analysis (Web UI)

Example of various sports data analysis queries:

https://github.com/user-attachments/assets/7b9c9cd6-56e3-46ee-a6c6-c32324a0c5ef

Installation

pip install bambooai

Or alternatively clone the repo and install the requirements

git clone https://github.com/pgalko/BambooAI.git
pip install -r requirements.txt

Quick Start

Try it out on a basic example in Google Colab: Open In Colab

Basic Example

  1. Install BambooAI:

    pip install bambooai
    
  2. Configure environment:

    cp .env.example .env
    # Edit .env with your settings
    
  3. Configure agents/models

    cp LLM_CONFIG_sample.json LLM_CONFIG.json
    # Edit LLM_CONFIG.json with your desired combination of agents, models and parameters
    
  4. Run

    import pandas as pd
    from bambooai import BambooAI
    
    import plotly.io as pio
    pio.renderers.default = 'jupyterlab'
    
    df = pd.read_csv('titanic.csv')
    bamboo = BambooAI(df=df, planning=True, vector_db=False, search_tool=True)
    bamboo.pd_agent_converse()
    

How It Works

The BambooAI operates through six key steps:

  1. Initiation

    • Launches with a user question or prompt for one
    • Continues in a conversation loop until exit
  2. Task Routing

    • Classifies questions using LLM
    • Routes to appropriate handler (text response or code generation)
  3. User Feedback

    • If the instruction is vague or unclear the model will pause and ask user for feedback
    • If the model encounters any ambiguity during the solving process, it will pause and ask for direction offering a few options
  4. Dynamic Prompt Build

    • Evaluates data requirements
    • Asks for feedback or uses tools if more context is needed
    • Formulates analysis plan
    • Performs semantic search for similar questions
    • Generates code using selected LLM
  5. Debugging and Execution

    • Executes generated code
    • Handles errors with LLM-based correction
    • Retries until successful or limit reached
  6. Results and Knowledge Base

    • Ranks answers for quality
    • Stores high-quality solutions in vector database
    • Presents formatted results or visualizations

Flow Chart

Configuration

Parameters

BambooAI accepts the following initialization parameters:

bamboo = BambooAI(
    df=None,                    # DataFrame to analyze
    auxiliary_datasets=None,    # List of paths to auxiliary datasets
    max_conversations=4,        # Number of conversation pairs to keep in memory
    search_tool=False,          # Enable internet search capability
    planning=False,             # Enable planning agent for complex tasks
    webui=False,                # Run as web application
    vector_db=False,            # Enable vector database for knowledge storage
    df_ontology=False,          # Use custom dataframe ontology
    exploratory=True,           # Enable expert selection for query handling
    custom_prompt_file=None     # Enable the use of custom/modified prompt templates
)

Detailed Parameter Descriptions:

  • df (pd.DataFrame, optional)

    • Input dataframe for analysis
    • If not provided, BambooAI will attempt to source data from the internet or auxiliary datasets
  • auxiliary_datasets (list, default=None)

    • List of paths to auxiliary datasets
    • These will be incorporated into the solution as needed, and pulled when the code executes
    • These are to complement the main dataframe
  • max_conversations (int, default=4)

    • Number of user-assistant conversation pairs to maintain in context
    • Affects context window and token usage
  • search_tool (bool, default=False)

    • Enables internet search capabilities
    • Requires appropriate API keys when enabled
  • planning (bool, default=False)

    • Enables the Planning agent for complex tasks
    • Breaks down tasks into manageable steps
    • Improves solution quality for complex queries
  • webui (bool, default=False)

    • Runs BambooAI as a web application
    • Uses Flask API for web interface
  • vector_db (bool, default=False)

    • Enables vector database for knowledge storage and semantic search
    • Stores high-quality solutions for future reference
    • Requires Pinecone API key
    • Supports two embeddings models text-embedding-3-small(OpenAI) and all-MiniLM-L6-v2(HF)
  • df_ontology (str, default=None)

    • Uses custom dataframe ontology for improved understanding
    • Requires OWL ontology as a .ttl file. The parameter takes the path to the TTL file.
    • Significantly improves solution quality
  • exploratory (bool, default=True)

    • Enables expert selection for query handling
    • Chooses between Research Specialist and Data Analyst roles
  • custom_prompt_file (str, default=None)

    • Enables users to provide custom prompt templates
    • Requires path to the YAML file containing the templates

Agent and Model Configuration

BambooAI uses multi-agent system where different specialized agents handle specific aspects of the data analysis process. Each agent can be configured to use different LLM models and parameters based on their specific requirements.

Configuration Structure

The LLM configuration is stored in LLM_CONFIG.json. Here's the complete configuration structure:

{
  "agent_configs": [
    {"agent": "Expert Selector", "details": {"model": "gpt-4.1", "provider":"openai","max_tokens": 2000, "temperature": 0}},
    {"agent": "Analyst Selector", "details": {"model": "claude-3-7-sonnet-20250219", "provider":"anthropic","max_tokens": 2000, "temperature": 0}},
    {"agent": "Theorist", "details": {"model": "gemini-2.5-pro-preview-03-25", "provider":"gemini","max_tokens": 4000, "temperature": 0}},
    {"agent": "Dataframe Inspector", "details": {"model": "gemini-2.0-flash", "provider":"gemini","max_tokens": 8000, "temperature": 0}},
    {"agent": "Planner", "details": {"model": "gemini-2.5-pro-preview-03-25", "provider":"gemini","max_tokens": 8000, "temperature": 0}},
    {"agent": "Code Generator", "details": {"model": "claude-3-5-sonnet-20241022", "provider":"anthropic","max_tokens": 8000, "temperature": 0}},
    {"agent": "Error Corrector", "details": {"model": "claude-3-5-sonnet-20241022", "provider":"anthropic","max_tokens": 8000, "temperature": 0}},
    {"agent": "Reviewer", "details": {"model": "gemini-2.5-pro-preview-03-25", "provider":"gemini","max_tokens": 8000, "temperature": 0}},
    {"agent": "Solution Summarizer", "details": {"model": "gemini-2.5-flash-preview-04-17", "provider":"gemini","max_tokens": 4000, "temperature": 0}},
    {"agent": "Google Search Executor", "details": {"model": "gemini-2.5-flash-preview-04-17", "provider":"gemini","max_tokens": 4000, "temperature": 0}},
    {"agent": "Google Search Summarizer", "details": {"model": "gemini-2.5-flash-preview-04-17", "provider":"gemini","max_tokens": 4000, "temperature": 0}}
  ],
  "model_properties": {
    "gpt-4o": {"capability":"base","multimodal":"true", "templ_formating":"text", "prompt_tokens": 0.0025, "completion_tokens": 0.010},
    "gpt-4.1": {"capability":"base","multimodal":"true", "templ_formating":"text", "prompt_tokens": 0.002, "completion_tokens": 0.008},
    "gpt-4o-mini": {"capability":"base", "mult
View on GitHub
GitHub Stars772
CategoryDevelopment
Updated5h ago
Forks82

Languages

Python

Security Score

100/100

Audited on Apr 1, 2026

No findings