TAble PArSing (TAPAS)

Code and checkpoints for training the transformer-based Table QA models introduced in the paper TAPAS: Weakly Supervised Table Parsing via Pre-training.

News

2021/09/15

Released code for sparse table attention from MATE: Multi-view Attention for Table Transformer Efficiency. For more info check here.

2021/08/24

Added a colab to try predictions on open domain question answering.

2021/08/20

New models and code for DoT: An efficient Double Transformer for NLP tasks with tables released here.

2021/07/23

New release of NQ with tables data used in Open Domain Question Answering over Tables via Dense Retrieval. The use of the data is detailed here.

2021/05/13

New models and code for Open Domain Question Answering over Tables via Dense Retrieval released here.

2021/03/23

The upcoming NAACL 2021 short paper Open Domain Question Answering over Tables via Dense Retrieval extends the TAPAS capabilities to table retrieval and open-domain QA. We are planning to release the new models and code soon.

2020/12/17

TAPAS is added to huggingface/transformers in version 4.1.1. 28 checkpoints are added to the huggingface model hub and can be played with using a custom table question answering widget.

2020/10/19

Small change to WTQ training example creation
- Questions with ambiguous cell matches will now be discarded
- This improves denotation accuracy by ~1 point
- For more details see this issue.
Added option to filter table columns by textual overlap with question
- Based on the HEM method described in section 3.3 of Understanding tables with intermediate pre-training.

2020/10/09

Released code & models to run TAPAS on TabFact for table entailment, companion for the EMNLP 2020 Findings paper Understanding tables with intermediate pre-training.
Added a colab to try predictions on TabFact
Added new page describing the intermediate pre-training process.

2020/08/26

Added a colab to try predictions on WTQ

2020/08/05

New pre-trained models (see Data section below)
reset_position_index_per_cell: New option that allows to train models that instead of using absolute position indices reset the position index when a new cell starts.

2020/06/10

Bump TensorFlow to v2.2

2020/06/08

Released the pre-training data.

2020/05/07

Added a colab to try predictions on SQA

Installation

The easiest way to try out TAPAS with free GPU/TPU is in our Colab, which shows how to do predictions on SQA.

The repository uses protocol buffers, and requires the protoc compiler to run. You can download the latest binary for your OS here. On Ubuntu/Debian, it can be installed with:

sudo apt-get install protobuf-compiler

Afterwards, clone and install the git repository:

git clone https://github.com/google-research/tapas
cd tapas
pip install -e .

To run the test suite we use the tox library which can be run by calling:

pip install tox
tox

Models

We provide pre-trained models for different model sizes.

The metrics are computed by our tool and not the official metrics of the respective tasks. We provide them so one can verify whether one's own runs are in the right ballpark. They are medians over three individual runs.

Models with intermediate pre-training (2020/10/07).

New models based on the ideas discussed in Understanding tables with intermediate pre-training. Learn more about the methods use here.

WTQ

Trained from Mask LM, intermediate data, SQA, WikiSQL.

WIKISQL

Trained from Mask LM, intermediate data, SQA.

TABFACT

Trained from Mask LM, intermediate data.

Size | Reset | Dev Accuracy | Link -------- | --------| -------- | ---- LARGE | noreset | 0.8101 | tapas_tabfact_inter_masklm_large.zip LARGE | reset | 0.8159 | tapas_tabfact_inter_masklm_large_reset.zip BASE | noreset | 0.7856 | [tapas_tabfact_inter_masklm_base.zip](https:

Tapas

Install / Use

README