WikiSQL

A large crowd-sourced dataset for developing natural language interfaces for relational databases. WikiSQL is the dataset released along with our work Seq2SQL: Generating Structured Queries from Natural Language using Reinforcement Learning.

Citation

If you use WikiSQL, please cite the following work:

Victor Zhong, Caiming Xiong, and Richard Socher. 2017. Seq2SQL: Generating Structured Queries from Natural Language using Reinforcement Learning.

@article{zhongSeq2SQL2017,
  author    = {Victor Zhong and
               Caiming Xiong and
               Richard Socher},
  title     = {Seq2SQL: Generating Structured Queries from Natural Language using
               Reinforcement Learning},
  journal   = {CoRR},
  volume    = {abs/1709.00103},
  year      = {2017}
}

Notes

Regarding tokenization and Stanza --- when WikiSQL was written 3-years ago, it relied on Stanza, a CoreNLP python wrapper that has since been deprecated. If you'd still like to use the tokenizer, please use the docker image. We do not anticipate switching to the current Stanza as changes to the tokenizer would render the previous results not reproducible.

Leaderboard

If you submit papers on WikiSQL, please consider sending a pull request to merge your results onto the leaderboard. By submitting, you acknowledge that your results are obtained purely by training on the training split and tuned on the dev split (e.g. you only evaluted on the test set once). Moreover, you acknowledge that your models only use the table schema and question during inference. That is they do not use the table content. Update (May 12, 2019): We now have a separate leaderboard for weakly supervised models that do not use logical forms during training.

Weakly supervised without logical forms

| Model | Dev execution accuracy | Test execution accuracy | | :---: | :---: | :---: | | TAPEX (Liu 2022) | 89.2 | 89.5 | | HardEM (Min 2019) | 84.4 | 83.9 | | LatentAlignment (Wang 2019) | 79.4 | 79.3 | | MeRL (Agarwal 2019) | 74.9 +/- 0.1 | 74.8 +/- 0.2 | | MAPO (Liang 2018) | 72.2 +/- 0.2 | 72.1 +/- 0.3 | | Rule-SQL (Guo 2019) | 61.1 +/- 0.2 | 61.0 +/- 0.3 |

Supervised via logical forms

| Model | Dev logical form accuracy | Dev execution accuracy | Test logical form accuracy | Test execution accuracy | Uses execution | | :---: | :---: | :---: | :---: | :---: | :---: | | SeaD +Execution-Guided Decoding (Xu 2021) (Ant Group, Ada & ZhiXiaoBao) | 87.6 | 92.9 | 87.5 | 93.0 | Inference | | SDSQL +Execution-Guided Decoding (Hui 2020) (Alibaba Group)| 87.1 | 92.6 | 87.0 | 92.7 | Inference | | IE-SQL +Execution-Guided Decoding (Ma 2020) (Ping An Life, AI Team)| 87.9 | 92.6 | 87.8 | 92.5 | Inference | | HydraNet +Execution-Guided Decoding (Lyu 2020) (Microsoft Dynamics 365 AI) (code)| 86.6 | 92.4 | 86.5 | 92.2 | Inference | | BRIDGE^ +Execution-Guided Decoding (Lin 2020) (Salesforce Research) | 86.8 | 92.6 | 86.3 | 91.9 | Inference | | X-SQL +Execution-Guided Decoding (He 2019) | 86.2 | 92.3 | 86.0 | 91.8 | Inference | | SDSQL (Hui 2020) (Alibaba Group)| 86.0 | 91.8 | 85.6 | 91.4 | | | BRIDGE^ (Lin 2020) (Salesforce Research) | 86.2 | 91.7 | 85.7 | 91.1 | | Text2SQLGen + EG (Mellah 2021) (Novelis.io Research) | | 91.2 | | 91.0 | Inference | | SeqGenSQL+EG (Li 2020) | | 90.8 | | 90.5 | Inference | | SeqGenSQL (Li 2020) | | 90.6 | | 90.3| Inference | | SeaD (Xu 2021) (Ant Group, Ada & ZhiXiaoBao) | 84.9 | 90.2 | 84.7 | 90.1 | Inference | | (Guo 2019) +Execution-Guided Decoding with BERT-Base-Uncased^| 85.4 | 91.1 | 84.5 | 90.1 | Inference | | SQLova +Execution-Guided Decoding (Hwang 2019) | 84.2 | 90.2 | 83.6 | 89.6 | Inference | | IncSQL +Execution-Guided Decoding (Shi 2018) | 51.3 | 87.2 | 51.1 | 87.1 | Inference | | HydraNet (Lyu 2020) (Microsoft Dynamics 365 AI) (code)| 83.6 | 89.1 | 83.8 | 89.2 | | | (Guo 2019) with BERT-Base-Uncased^ | 84.3 | 90.3 | 83.7 | 89.2 | | | IE-SQL (Ma 2020) (Ping An Life, AI Team) | 84.6 | 88.7 | 84.6 | 88.8 | | | X-SQL (He 2019) | 83.8 | 89.5 | 83.3 | 88.7 | | | SQLova (Hwang 2019) | 81.6 | 87.2 | 80.7 | 86.2 | | | Execution-Guided Decoding (Wang 2018) | 76.0 | 84.0 | 75.4 | 83.8 | Inference | | IncSQL (Shi 2018) | 49.9 | 84.0 | 49.9 | 83.7 | | | Auxiliary Mapping Task (Chang 2019) | 76.0 | 82.3 | 75.0 | 81.7 | | | MQAN (unordered) (McCann 2018) | 76.1 | 82.0 | 75.4 | 81.4 | | | MQAN (ordered) (McCann 2018) | 73.5 | 82.0 | 73.2 | 81.4 | | | Coarse2Fine (Dong 2018) | 72.5 | 79.0 | 71.7 | 78.5 | | | TypeSQL (Yu 2018) | - | 74.5 | - | 73.5 | | | PT-MAML (Huang 2018) | 63.1 | 68.3 | 62.8 | 68.0 | | | (Guo 2018)

WikiSQL

Install / Use

README

WikiSQL

Citation

Notes

Leaderboard

Weakly supervised without logical forms

Supervised via logical forms