Balsa
Balsa is a learned SQL query optimizer. It tailor optimizes your SQL queries to find the best execution plans for your hardware and engine.
Install / Use
/learn @balsa-project/BalsaREADME
Balsa
<p> <a href="http://arxiv.org/abs/2201.01441"> <img alt="arXiv" src="https://img.shields.io/badge/arXiv-2201.01441-blue"> </a> <a href="https://github.com/balsa-project/balsa/blob/master/LICENSE"> <img alt="LICENSE" src="https://img.shields.io/github/license/balsa-project/balsa.svg?color=brightgreen"> </a> </p>Balsa is a learned query optimizer. It learns to optimize SQL queries by trial-and-error using deep reinforcement learning and sim-to-real learning.
Notably, Balsa is the first end-to-end learned optimizer that does not rely on learning from an existing expert optimizer's plans, while being able to surpass the performance of expert plans, sometimes by a sizable margin.
<p align="center"> <img src="assets/balsa-overview.png" width="485"/> <p>For technical details, see the SIGMOD 2022 paper, Balsa: Learning a Query Optimizer Without Expert Demonstrations [bibtex].
Setup | Quickstart | Experiment configs | Metrics and artifacts | Cluster mode | Q&A | Citation
Setup
To quickly get started, run the following on one machine which will run both the agent neural network and query execution.
-
Clone and install Balsa.
<details> <summary>Details</summary> <br>
</details>git clone https://github.com/balsa-project/balsa.git ~/balsa cd ~/balsa # Recommended: run inside a Conda environment. # All commands that follow are run under this conda env. conda create -n balsa python=3.7 -y conda activate balsa pip install -r requirements.txt pip install -e . pip install -e pg_executor -
Install Postgres v12.5.
<details> <summary>Details</summary><br>This can be done in several ways. For example, installing from source:
</details>cd ~/ wget https://ftp.postgresql.org/pub/source/v12.5/postgresql-12.5.tar.gz tar xzvf postgresql-12.5.tar.gz cd postgresql-12.5 ./configure --prefix=/data/postgresql-12.5 --without-readline sudo make -j sudo make install echo 'export PATH=/data/postgresql-12.5/bin:$PATH' >> ~/.bashrc source ~/.bashrc -
Install the
<details> <summary>Details</summary> <br>pg_hint_planextension v1.3.7.
</details>cd ~/ git clone https://github.com/ossc-db/pg_hint_plan.git -b REL12_1_3_7 cd pg_hint_plan # Modify Makefile: change line # PG_CONFIG = pg_config # to # PG_CONFIG = /data/postgresql-12.5/bin/pg_config vim Makefile make sudo make install -
Load data into Postgres & start it with the correct configuration.
<details> <summary>Details</summary> <br> For example, load the Join Order Benchmark (JOB) tables:cd ~/ mkdir -p datasets/job && pushd datasets/job wget -c http://homepages.cwi.nl/~boncz/job/imdb.tgz && tar -xvzf imdb.tgz && popd # Prepend headers to CSV files python3 ~/balsa/scripts/prepend_imdb_headers.py # Create and start the DB pg_ctl -D ~/imdb initdb # Copy custom PostgreSQL configuration. cp ~/balsa/conf/balsa-postgresql.conf ~/imdb/postgresql.conf # Start the server pg_ctl -D ~/imdb start -l logfile # Load data + run analyze (can take several minutes) cd ~/balsa bash load-postgres/load_job_postgres.sh ~/datasets/jobPerform basic checks:
</details>psql imdbload # Check that both primary and foreign key indexes are built: imdbload=# \d title [...] # Check that data count is correct: imdbload=# select count(*) from title; count --------- 2528312 (1 row)
NOTE: Using one machine is only for quickly trying out Balsa. To cleanly reproduce results, use Cluster mode which automates the above setup on a cloud and separates the training machine from the query execution machines.
Quickstart
First, run the baseline PostgreSQL plans (the expert) on the Join Order Benchmark:
python run.py --run Baseline --local
This will prompt you to log into Weights & Biases to track experiments and visualize metrics easily, which we highly recommend. (You can disable it by prepending the env var WANDB_MODE=disabled or offline).
The first run may take a while due to warming up. After finishing, you can see messages like:
latency_expert/workload (seconds): 156.62 (113 queries)
...
wandb: latency_expert/workload 156.61727
Next, to launch a Balsa experiment:
python run.py --run Balsa_JOBRandSplit --local
The first time this is run, simulation data is collected for all training queures, which will finish in ~5 minutes (cached for future runs):
...
I0323 04:15:48.212239 140382943663744 sim.py:738] Collection done, stats:
I0323 04:15:48.212319 140382943663744 sim.py:742] num_queries=94 num_collected_queries=77 num_points=516379 latency_s=309.3
I0323 04:16:39.296590 140382943663744 sim.py:666] Saved simulation data (len 516379) to: data/sim-data-88bd801a.pkl
Balsa's simulation-to-reality approach requires first training an agent in simulation, then in real execution. To speed up the simulation phase, we have provided pretrained checkpoints for the simulation agent:
...
I0323 04:18:26.856739 140382943663744 sim.py:985] Loaded pretrained checkpoint: checkpoints/sim-MinCardCost-rand52split-680secs.ckpt
Then, the agent will start the first iteration of real-execution learning: planning all training queries, sending them off for execution, and waiting for these plans to finish. Periodically, test queries are planned and executed for logging.
Handy commands:
- To kill the experiment(s):
pkill -f run.py - To monitor a machine:
dstat -mcdn
Experiment configs
All experiments and their hyperparameters are declared in experiments.py.
To run an experiment with the local Postgres execution engine:
# <name> is a config registered in experiments.py.
python run.py --run <name> --local
Main Balsa agent:
| Benchmark | Config |
|-----------------------|-----------------------------------|
| JOB (Random Split) | Balsa_JOBRandSplit |
| JOB Slow (Slow Split) | Balsa_JOBSlowSplit |
Ablation: impact of the simulator (Figure 10):
| Variant | Config |
|------------|------------------------------------------------|
| Balsa Sim | (main agent) Balsa_JOBRandSplit |
| Expert Sim | JOBRandSplit_PostgresSim |
| No Sim | JOBRandSplit_NoSim |
NOTE: Running
JOBRandSplit_PostgresSimfor the first time will be slow (1.1 hours) due to simulation data being collected fromEXPLAIN. This data is cached indata/for future runs.
Ablation: impact of the timeout mechanism (Figure 11):
| Variant | Config |
|------------------------|-------------------------------------------------|
| Balsa (safe execution) | (main agent) Balsa_JOBRandSplit |
| no timeout | JOBRandSplit_NoTimeout |
Ablation: impact of exploration schemes (Figure 12):
| Variant | Config |
|--------------------------|---------------------------------------------------|
| Balsa (safe exploration) | (main agent) Balsa_JOBRandSplit |
| epsilon-greedy | JOBRandSplit_EpsGreedy |
| no exploration | JOBRandSplit_NoExplore |
Ablation: impact of training schemes (Figure 13):
| Variant | Config |
|-------------------|-------------------------------------------------|
| Balsa (on-policy) | (main agent) Balsa_JOBRandSplit |
| retrain | JOBRandSplit_RetrainScheme |
Comparision with learning from expert demonstrations (Neo-impl) (Figure 15):
| Variant | Config |
|-------------------|-------------------------------------------------|
| Balsa | (main agent) Balsa_JOBRandSplit |
| Neo-impl | NeoImpl_JOBRandSplit |
Diversified experiences (Figure 16):
| Variant | Config |
|-------------------|-------------------------------------------------|
| Balsa | (Main agents) JOB Balsa_JOBRandSplit; JOB Slow Balsa_JOBSlowSplit |
| Balsa-8x (uses 8 main agents' data) | JOB Balsa_JOBRandSplitReplay; JOB Slow Balsa_JOBSlowSplitReplay |
Generalizing to highly distinct join templates, Ext-JOB (Figure 17):
| Variant | Config |
|-------------------|-------------------------------------------------|
| Balsa, data collection agent | Balsa_TrainJOB_TestExtJOB |
| Balsa-1x (uses 1 base agent's data) | Balsa1x_TrainJOB_TestExtJOB |
| Balsa-8x (uses 8 base agents' data) | Balsa8x_TrainJOB_TestExtJOB |
NOTE: When running a Ext-JOB config for the first time, you may see the error
Missing nodes in init_experience. This meansdata/initial_policy_data.pklcontains the expert latencies of all 113 JOB queries for printing (assuming you ran the previous configs first) but lacks the new Ext-JOB queries. To fix, rename the previous.pklfile and rerun the new Ext-JOB config, which will automatically run the expert plans of the new query set to regenerate this file (as well as gathering the simulation data).
To specify a new experiment, subclass an existing config (give the subclass a descriptive name), change the values of some hyperparameters, and r
Related Skills
feishu-drive
346.8k|
things-mac
346.8kManage Things 3 via the `things` CLI on macOS (add/update projects+todos via URL scheme; read/search/list from the local Things database)
clawhub
346.8kUse the ClawHub CLI to search, install, update, and publish agent skills from clawhub.com
codebase-memory-mcp
1.2kHigh-performance code intelligence MCP server. Indexes codebases into a persistent knowledge graph — average repo in milliseconds. 66 languages, sub-ms queries, 99% fewer tokens. Single static binary, zero dependencies.
