SQLFixAgent
The official implementation of our work SQLFixAgent: Towards Semantic-Accurate Text-to-SQL Parsing via Consistency-Enhanced Multi-Agent Collaboration in AAAI 2025
Install / Use
/learn @Cen-Jipeng-SUDA/SQLFixAgentREADME
If this repository could help you, please cite the following paper:
@inproceedings{cen2025sqlfixagent,
author = {Jipeng Cen and Jiaxin Liu and Zhixu Li and Jingjing Wang},
title = "SQLFixAgent: Towards Semantic-Accurate Text-to-SQL Parsing via Consistency-Enhanced Multi-Agent Collaboration",
booktitle = "AAAI",
year = "2025"
}
Prepare Environments
Our experiments are conducted in the following environments:
- GPU: 8 * NVIDIA RTX3090 with 24GB VRAM, CUDA version 11.8
- Python Environment: Anaconda3, Python version 3.8.5
Step1: Create Python Environments
Create a new Anaconda environment and install the required modules:
conda create -n sqlfixagent python=3.8.5
conda activate sqlfixagent
conda install pytorch==1.13.1 pytorch-cuda=11.7 -c pytorch -c nvidia
pip install -r requirements.txt
git clone https://github.com/lihaoyang-ruc/SimCSE.git
cd SimCSE-main
python setup.py install
cd ..
Step2: Set OpenAI API :
vim ./core/api_config.py
Step3: Download Datasets and Checkpoints
a. Download the necessary datasets data.zip, the schema item classifier checkpoints sic_ckpts.zip. Then, unzip them using the following commands:
unzip data.zip
unzip sic_ckpts.zip
b. Download SFT-LLM checkpoints to be evaluated: codes-3b-bird-with-evidence, codes-7b-bird-with-evidence, codes-3b-spider, codes-7b-spiderand place them under the ./model folder.
c. Download simsce model and place it under the ./model folder.
Step4: Pre-process data
You can skip this step as the pre-processed datasets are already included in the aforementioned data.zip file. However, if you wish to reproduce our data pre-processing procedure, you can install Java :
apt-get update
apt-get install -y openjdk-11-jdk
Then, execute the following two Python scripts:
# build BM25 index for each database
python -u build_contents_index.py
# pre-process dataset
python -u prepare_sft_datasets.py
Please note that this process may take a considerable amount of time (approximately 1-2 hours).
Run Inference
bash run.sh
run.sh includes two script:
run_sqltool.py uses CodeS to generate SQL
run_fix.py uses a LLM API and CodeS to detect and fix errors from the previous stage.
Note
The program will save the filtered schema to ./data/temp if it is not exist, delete it for full running of code.
Besides, the program will save the SQLTool error log to ./core/memory if it is not exist, by using codes-3b to simulate runtime error collection on the training set, which can be time-consuming.
Acknowledgements
We would thanks to official team of Bird for their help in evaluating our method on Bird's test set. We would also thanks to MAC-SQL (paper, code), Codes (paper, code), Bird (paper, dataset), Spider (paper, dataset), Spider-DK (paper, dataset), Spider-Syn (paper, dataset), Spider-Realistic (paper, dataset) for their interesting work and open-sourced code and dataset.
Related Skills
feishu-drive
347.0k|
things-mac
347.0kManage Things 3 via the `things` CLI on macOS (add/update projects+todos via URL scheme; read/search/list from the local Things database)
clawhub
347.0kUse the ClawHub CLI to search, install, update, and publish agent skills from clawhub.com
codebase-memory-mcp
1.2kHigh-performance code intelligence MCP server. Indexes codebases into a persistent knowledge graph — average repo in milliseconds. 66 languages, sub-ms queries, 99% fewer tokens. Single static binary, zero dependencies.
