Athena
Source code of our SIGMOD 2025 paper "Athena: An Effective Learning-based Framework for Query Optimizer Performance Improvement"
Install / Use
/learn @DBGroup-SUSTech/AthenaREADME
Athena
Athena is a learned optimizer enhancer, proposed to enhance the query optimizer of DBMSs. It is consisted of three key designs: (i) Order-centric plan explorer, (ii) Tree-Mamba plan comparator, and (iii) Time-weighted loss function.
-
Order-centric plan explorer is implemented in Athena_PG.
-
Tree-Mamba plan comparator is implemented in athena_model.py.
-
Time-weighted loss function is implemented in train.py as function
tailr_loss_with_logits.
To ensure usability and reusability, we provide the following two implementation approaches: (1) Docker and (2) manual step-by-step deployment.
Method 1: Run in Docker
-
Install Docker and Docker-compose.
For example, for ubuntu, please refer to the following link:
Install Docker using the apt repository
Notice that if you encountered a network problem, supposing that you have a http proxy at
http://<ip>:<port>, such ashttp://127.0.0.1:7890, you can try the following command. Remember to replace the<ip>and<port>in the following texts to your actual value.export HTTP_PROXY=http://<ip>:<port> export HTTPS_PROXY=$HTTP_PROXY cat <<EOF | sudo tee /etc/systemd/system/docker.service.d/http-proxy.conf > /dev/null [Service] Environment="HTTP_PROXY=$HTTP_PROXY" Environment="HTTPS_PROXY=$HTTP_PROXY" Environment="NO_PROXY=localhost,127.0.0.1" EOFand add two lines after the line
- POSTGRES_PASSWORD=postgresin the./docker-compose.yml- HTTP_PROXY=http://<ip>:<port> - HTTPS_PROXY=http://<ip>:<port> -
Download datasets
- IMDB
Download the imdb.tgz to
./postgres_service -
Generate configuration for PostgreSQL.
Go to PGTune, generate the configuration of your computer, and save it to
./postgres_service/postgresql.conf. -
Train our model in Docker
docker compose up --build -
Run other commands
Refer to other commands in section Run. Replace the command property of the athena service with the command you want in
docker-compose.yml. And then rundocker compose up --buildagain.
Method 2: Manual deployment
I. Dependencies
- Python dependencies
uv pip install -r requirements.txt
// install mamba in https://github.com/a858438680/TreeMamba.git
git clone https://github.com/a858438680/tree_mamba.git
uv pip install --no-build-isolation ./python
- Postgres extensions
No need to install these extensions here, there is guidelines when installing Athena_PG.
II. Data
Athena support 4 benchmarks: JOB, STATS-CEB, TPC-DS, and DSB.
We have included their workloads into the dir workloads/, where the training and test queries of each benchmark are placed into dir xxx and xxx-sample respectively.
III. Run
Install Athena_PG before the following steps.
1. Prepare plans for model training
# generate plans for JOB queries as example
python -u src/Athena/generate_dataset.py --database imdb --workload JOB
This code employs Athena's plan explorer to enumerate candidate plans for input queries, executes them, and stores the results into dir datasets as the training samples. The dir datasets is orgranizeda as datasets/{database}/{workload}/{plan-generation-method}/.
We also support to generate training samples of Bao, Lero, Kepler. For example, for Bao, check the file.
2. Train the Tree-Mamba plan comparator
# set parameter for torch determistic algorithm
export CUBLAS_WORKSPACE_CONFIG=:4096:8
# train a Tree-Mamba plan comparator for JOB queries as an example
python scripts/run_experiments.py --models athena --workloads JOB --methods athena
This code will train a Tree-Mamba plan comparator based on the collected training samples in the last step and store the model in the dir models/{model}/{workload}/{plan-generation-methods}/{random-seed}/models/. For this example, the dir to store the model is models/athena/JOB/athena/3407/models/.
3. Test the model
This step tests the model based JOB as an example.
- 3.1 Geneate plans for test queries
python -u src/Athena/generate_dataset.py --database imdb --workload JOB-sample --generate_candidate_plans
The code with the option generate_candidate_plans only generates the plans for queries and does not execute them. The generated plans will be also stored in dir datasets/.
- 3.2 Test model
python src/Athena/test.py --dataset imdb/JOB-sample/Athena --model models/athena/JOB/athena/3407/models/199.pt
This code will employ the model to select a plan from the candidate plans of each query and store the indexes of selected plans in the dir results/{database}/{workload}/{plan-generation-method}/{model}.json.
file results/imdb/JOB-sample/Athena/Athena.json.
- 3.3 Execute the selected plans in the last step
python -u scripts/test_model.py --database=imdb --workload=JOB-sample --plan_generation_method=Athena --model=Athena --result_path=Athena
This code loads the indexes of the selected plan in the last step, executes them, and stores the execution results into dir records/. This code will directly load the records instead of executing the queries next time.
We have stored model selection and execution records for Athena on JOB-sample, so you can just run 3.3 without 3.1 and 3.2.
Citation
If you find Athena useful, please cite the following paper:
@article{10.1145/3725395,
author = {Li, Runzhong and Li, Qilong and Liu, Haotian and Mao, Rui and Li, Qing and Tang, Bo},
title = {Athena: An Effective Learning-based Framework for Query Optimizer Performance Improvement},
year = {2025},
issue_date = {June 2025},
volume = {3},
number = {3},
doi = {10.1145/3725395},
journal = {Proc. ACM Manag. Data},
month = jun,
articleno = {132},
numpages = {24}
}
