SkillAgentSearch skills...

Balsa

Balsa is a learned SQL query optimizer. It tailor optimizes your SQL queries to find the best execution plans for your hardware and engine.

Install / Use

/learn @balsa-project/Balsa

README

Balsa

<p> <a href="http://arxiv.org/abs/2201.01441"> <img alt="arXiv" src="https://img.shields.io/badge/arXiv-2201.01441-blue"> </a> <a href="https://github.com/balsa-project/balsa/blob/master/LICENSE"> <img alt="LICENSE" src="https://img.shields.io/github/license/balsa-project/balsa.svg?color=brightgreen"> </a> </p>

Balsa is a learned query optimizer. It learns to optimize SQL queries by trial-and-error using deep reinforcement learning and sim-to-real learning.

Notably, Balsa is the first end-to-end learned optimizer that does not rely on learning from an existing expert optimizer's plans, while being able to surpass the performance of expert plans, sometimes by a sizable margin.

<p align="center"> <img src="assets/balsa-overview.png" width="485"/> <p>

For technical details, see the SIGMOD 2022 paper, Balsa: Learning a Query Optimizer Without Expert Demonstrations [bibtex].

Setup | Quickstart | Experiment configs | Metrics and artifacts | Cluster mode | Q&A | Citation

Setup

To quickly get started, run the following on one machine which will run both the agent neural network and query execution.

  1. Clone and install Balsa.

    <details> <summary>Details</summary> <br>
    git clone https://github.com/balsa-project/balsa.git ~/balsa
    cd ~/balsa
    # Recommended: run inside a Conda environment.
    # All commands that follow are run under this conda env.
    conda create -n balsa python=3.7 -y
    conda activate balsa
    
    pip install -r requirements.txt
    pip install -e .
    pip install -e pg_executor
    
    </details>
  2. Install Postgres v12.5.

    <details> <summary>Details</summary>

    <br>This can be done in several ways. For example, installing from source:

    cd ~/
    wget https://ftp.postgresql.org/pub/source/v12.5/postgresql-12.5.tar.gz
    tar xzvf postgresql-12.5.tar.gz
    cd postgresql-12.5
    ./configure --prefix=/data/postgresql-12.5 --without-readline
    sudo make -j
    sudo make install
    
    echo 'export PATH=/data/postgresql-12.5/bin:$PATH' >> ~/.bashrc
    source ~/.bashrc
    
    </details>
  3. Install the pg_hint_plan extension v1.3.7.

    <details> <summary>Details</summary> <br>
    cd ~/
    git clone https://github.com/ossc-db/pg_hint_plan.git -b REL12_1_3_7
    cd pg_hint_plan
    # Modify Makefile: change line
    #   PG_CONFIG = pg_config
    # to
    #   PG_CONFIG = /data/postgresql-12.5/bin/pg_config
    vim Makefile
    make
    sudo make install
    
    </details>
  4. Load data into Postgres & start it with the correct configuration.

    <details> <summary>Details</summary> <br> For example, load the Join Order Benchmark (JOB) tables:
    cd ~/
    mkdir -p datasets/job && pushd datasets/job
    wget -c http://homepages.cwi.nl/~boncz/job/imdb.tgz && tar -xvzf imdb.tgz && popd
    # Prepend headers to CSV files
    python3 ~/balsa/scripts/prepend_imdb_headers.py
    
    # Create and start the DB
    pg_ctl -D ~/imdb initdb
    
    # Copy custom PostgreSQL configuration.
    cp ~/balsa/conf/balsa-postgresql.conf ~/imdb/postgresql.conf
    
    # Start the server
    pg_ctl -D ~/imdb start -l logfile
    
    # Load data + run analyze (can take several minutes)
    cd ~/balsa
    bash load-postgres/load_job_postgres.sh ~/datasets/job
    

    Perform basic checks:

    psql imdbload
    # Check that both primary and foreign key indexes are built:
    imdbload=# \d title
    [...]
    
    # Check that data count is correct:
    imdbload=# select count(*) from title;
      count
    ---------
    2528312
    (1 row)
    
    </details>

NOTE: Using one machine is only for quickly trying out Balsa. To cleanly reproduce results, use Cluster mode which automates the above setup on a cloud and separates the training machine from the query execution machines.

Quickstart

First, run the baseline PostgreSQL plans (the expert) on the Join Order Benchmark:

python run.py --run Baseline --local

This will prompt you to log into Weights & Biases to track experiments and visualize metrics easily, which we highly recommend. (You can disable it by prepending the env var WANDB_MODE=disabled or offline).

The first run may take a while due to warming up. After finishing, you can see messages like:

latency_expert/workload (seconds): 156.62 (113 queries)
...
wandb: latency_expert/workload 156.61727

Next, to launch a Balsa experiment:

python run.py --run Balsa_JOBRandSplit --local

The first time this is run, simulation data is collected for all training queures, which will finish in ~5 minutes (cached for future runs):

...
I0323 04:15:48.212239 140382943663744 sim.py:738] Collection done, stats:
I0323 04:15:48.212319 140382943663744 sim.py:742]   num_queries=94 num_collected_queries=77 num_points=516379 latency_s=309.3
I0323 04:16:39.296590 140382943663744 sim.py:666] Saved simulation data (len 516379) to: data/sim-data-88bd801a.pkl

Balsa's simulation-to-reality approach requires first training an agent in simulation, then in real execution. To speed up the simulation phase, we have provided pretrained checkpoints for the simulation agent:

...
I0323 04:18:26.856739 140382943663744 sim.py:985] Loaded pretrained checkpoint: checkpoints/sim-MinCardCost-rand52split-680secs.ckpt

Then, the agent will start the first iteration of real-execution learning: planning all training queries, sending them off for execution, and waiting for these plans to finish. Periodically, test queries are planned and executed for logging.

Handy commands:

  • To kill the experiment(s): pkill -f run.py
  • To monitor a machine: dstat -mcdn

Experiment configs

All experiments and their hyperparameters are declared in experiments.py. To run an experiment with the local Postgres execution engine:

# <name> is a config registered in experiments.py.
python run.py --run <name> --local

Main Balsa agent:

| Benchmark | Config | |-----------------------|-----------------------------------| | JOB (Random Split) | Balsa_JOBRandSplit | | JOB Slow (Slow Split) | Balsa_JOBSlowSplit |

Ablation: impact of the simulator (Figure 10):

| Variant | Config | |------------|------------------------------------------------| | Balsa Sim | (main agent) Balsa_JOBRandSplit | | Expert Sim | JOBRandSplit_PostgresSim | | No Sim | JOBRandSplit_NoSim |

NOTE: Running JOBRandSplit_PostgresSim for the first time will be slow (1.1 hours) due to simulation data being collected from EXPLAIN. This data is cached in data/ for future runs.

Ablation: impact of the timeout mechanism (Figure 11):

| Variant | Config | |------------------------|-------------------------------------------------| | Balsa (safe execution) | (main agent) Balsa_JOBRandSplit | | no timeout | JOBRandSplit_NoTimeout |

Ablation: impact of exploration schemes (Figure 12):

| Variant | Config | |--------------------------|---------------------------------------------------| | Balsa (safe exploration) | (main agent) Balsa_JOBRandSplit | | epsilon-greedy | JOBRandSplit_EpsGreedy | | no exploration | JOBRandSplit_NoExplore |

Ablation: impact of training schemes (Figure 13):

| Variant | Config | |-------------------|-------------------------------------------------| | Balsa (on-policy) | (main agent) Balsa_JOBRandSplit | | retrain | JOBRandSplit_RetrainScheme |

Comparision with learning from expert demonstrations (Neo-impl) (Figure 15):

| Variant | Config | |-------------------|-------------------------------------------------| | Balsa | (main agent) Balsa_JOBRandSplit | | Neo-impl | NeoImpl_JOBRandSplit |

Diversified experiences (Figure 16):

| Variant | Config | |-------------------|-------------------------------------------------| | Balsa | (Main agents) JOB Balsa_JOBRandSplit; JOB Slow Balsa_JOBSlowSplit | | Balsa-8x (uses 8 main agents' data) | JOB Balsa_JOBRandSplitReplay; JOB Slow Balsa_JOBSlowSplitReplay |

Generalizing to highly distinct join templates, Ext-JOB (Figure 17):

| Variant | Config | |-------------------|-------------------------------------------------| | Balsa, data collection agent | Balsa_TrainJOB_TestExtJOB | | Balsa-1x (uses 1 base agent's data) | Balsa1x_TrainJOB_TestExtJOB | | Balsa-8x (uses 8 base agents' data) | Balsa8x_TrainJOB_TestExtJOB |

NOTE: When running a Ext-JOB config for the first time, you may see the error Missing nodes in init_experience. This means data/initial_policy_data.pkl contains the expert latencies of all 113 JOB queries for printing (assuming you ran the previous configs first) but lacks the new Ext-JOB queries. To fix, rename the previous .pkl file and rerun the new Ext-JOB config, which will automatically run the expert plans of the new query set to regenerate this file (as well as gathering the simulation data).

To specify a new experiment, subclass an existing config (give the subclass a descriptive name), change the values of some hyperparameters, and r

Related Skills

View on GitHub
GitHub Stars147
CategoryData
Updated6d ago
Forks36

Languages

Python

Security Score

100/100

Audited on Mar 28, 2026

No findings