SkillAgentSearch skills...

SPARTA

Semantic Parsing And Relational Table Aware Model that generates SQL from question written in Korean language

Install / Use

/learn @tootouch/SPARTA
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

SPARTA (Semantic Parsing And Relational Table Aware)

This is a term project in Unstructured Text Analysis class. We implement the deep learning model for converting Korean language to SQL query.

<div align='center'> <img src='https://user-images.githubusercontent.com/37654013/119700897-bec2c100-be8e-11eb-9d61-36de1ca66d5a.png'> </div>

Team Members

  • Hoonsang Yoon
  • Jaehyuk Heo
  • Jungwoo Choi
  • Jeongseob Kim

Information

Demo

Check about Demo in here.

Video

Text2SQL Result Video

Dataset

tar xvjf data/data.tar.bz2

Korean WikiSQL dataset

unzip data/ko_token.zip
unzip data/ko_token_not_h.zip
unzip data/ko_from_table.zip
unzip data/ko_from_table_not_h.zip

Translation

We translated English question into Korean question in four ways as follows.

Download dataset

No | Method | Data Name | Description ---|---|---|--- 1 | Where+Select | ko_token | Keep where values in label and column used in select clause among the words in English question 2 | Where | ko_token_not_h | Keep header of table among the words in English question 3 | Table+Header | ko_from_table | Keep values and header in table among the words in English question 4 | Table | ko_from_table_not_h | Keep values in table among the words in English question

<div align='center'> <strong>Method 1 (Where+Select)</strong><br> <img width="1000" src='https://user-images.githubusercontent.com/37654013/119702737-c1beb100-be90-11eb-9c71-00498dafec0d.png'> </div> <div align='center'> <strong>Method 2 (Where)</strong><br> <img width="1000" src='https://user-images.githubusercontent.com/37654013/119702997-0d715a80-be91-11eb-955d-bafd7e0912b4.png'> </div> <div align='center'> <strong>Method 3 (Table+Header)</strong><br> <img width="1000" src='https://user-images.githubusercontent.com/37654013/119703614-aef8ac00-be91-11eb-947d-6da2086ffeb7.png'> </div> <div align='center'> <strong>Method 4 (Table)</strong><br> <img width="1000" src='https://user-images.githubusercontent.com/37654013/119703354-6d680100-be91-11eb-87ab-a0d07bdf9df6.png'> </div>

Run translation

  1. Create a question dataframe to translate English to Korean.
bash run_translate.sh value
  1. Translate English to Korean by using Google Tanslator (click here!) and copy a text file in ko_data directory such as 'ko_train_question.txt'

  2. Insert Korean question

bash run_translate.sh token

SPARTA Model

We use pretrained multilingual BERT as encoder.

Sub Task

  1. SQLova [ paper | github ]
  2. HydraNet [ paper | github ]

Seq2Seq

  1. BRIDGE(TabularSemanticParsing)[ paper | github ]

Evaluation

  • Logical Form Accuracy
  • Execution Accuracy
<div align='center'> <img width="1000" src='https://user-images.githubusercontent.com/37654013/119704032-229ab900-be92-11eb-9687-acdc64ab117a.png'> </div>

Experiments

Model | Task | Test<br>Logical Form<br>Accuracy(%) | Test<br>Execution<br>Accuracy(%) ---|---|:---:|:---: SQLova | Subtask | 65.8 | 74.3 HydraNet | Subtask | 40.4 | 40.7 Bridge | Generation | 54.6 | 62.1

Download Trained Models

Method | SQlova | Bridge ---|---|--- Where+Select | Download | - Where | Download | - Table+Header | Download | - Table | Download | -

Presentation

Proposal

Interim Findings

Final

Reference

  • [1] Victor Zhong, Caiming Xiong, and Richard Socher. 2017. Seq2SQL: Generating Structured Queries from Natural Language using Reinforcement Learning.
  • [2] Hwang, W., Yim, J., Park, S., & Seo, M. (2019). A comprehensive exploration on wikisql with table-aware word contextualization. KR2ML Workship at NeurIPS 2019
  • [3] Lyu, Q., Chakrabarti, K., Hathi, S., Kundu, S., Zhang, J., & Chen, Z. (2020). Hybrid ranking network for text-to-sql. arXiv preprint arXiv:2008.04759.
  • [4] Xi Victoria Lin, Richard Socher and Caiming Xiong. Bridging Textual and Tabular Data for Cross-Domain Text-to-SQL Semantic Parsing. Findings of EMNLP 2020.

Related Skills

View on GitHub
GitHub Stars20
CategoryData
Updated1y ago
Forks6

Languages

Jupyter Notebook

Security Score

60/100

Audited on Mar 17, 2025

No findings