CorefQA: Coreference Resolution as Query-based Span Prediction

The repository contains the code of the recent research advances in Shannon.AI. Please post github issues or email xiaoya_li@shannonai.com for relevant questions.

CorefQA: Coreference Resolution as Query-based Span Prediction Wei Wu, Fei Wang, Arianna Yuan, Fei Wu and Jiwei Li In ACL 2020. paper If you find this repo helpful, please cite the following:

@article{wu2019coreference,
  title={Coreference Resolution as Query-based Span Prediction},
  author={Wu, Wei and Wang, Fei and Yuan, Arianna and Wu, Fei and Li, Jiwei},
  journal={arXiv preprint arXiv:1911.01746},
  year={2019}
}

Overview
Hardware Requirements
Install Package Dependencies
Data Preprocess
Download Pretrained MLM
Training
- Finetune the SpanBERT Model on the Combination of Squad and Quoref Datasets
- Train the CorefQA Model on the CoNLL-2012 Coreference Resolution Task
Evaluation and Prediction
Download the Final CorefQA Model
Descriptions of Directories
Acknowledgement
Useful Materials
Contact

Overview

The model introduces +3.5 (83.1) F1 performance boost over previous SOTA coreference models on the CoNLL benchmark. The current codebase is written in Tensorflow. We plan to release the PyTorch version soon. The current code version only supports training on TPUs and testing on GPUs (due to the annoying features of TF and TPUs). You thus have to bear the trouble of transferring all saved checkpoints from TPUs to GPUs for evaluation (we will fix this soon). Please follow the parameter setting in the log directionary to reproduce the performance.

| Model | F1 (%) | | -------------- |:------:| | Previous SOTA (Joshi et al., 2019a) | 79.6 | | CorefQA + SpanBERT-large | 83.1 |

Hardware Requirements

TPU for training: Cloud TPU v3-8 device (128G memory) with Tensorflow 1.15 Python 3.5

GPU for evaluation: with CUDA 10.0 Tensorflow 1.15 Python 3.5

Install Package Dependencies

$ python3 -m pip install --user virtualenv
$ virtualenv --python=python3.5 ~/corefqa_venv
$ source ~/corefqa_venv/bin/activate
$ cd CorefQA
$ pip install -r requirements.txt
# If you are using TPU, please run the following commands:
$ pip install --upgrade google-api-python-client 
$ pip install --upgrade oauth2client

Data Preprocess

Download the offical released Ontonotes 5.0 (LDC2013T19).
Preprocess Ontonotes5 annotations files for the CoNLL-2012 coreference resolution task. Run the command with Python 2 bash ./scripts/data/preprocess_ontonotes_annfiles.sh <path_to_LDC2013T19-ontonotes5_directory> <path_to_save_CoNLL12_coreference_resolution_directory> <language> and it will create {train/dev/test}.{language}.v4_gold_conll files in the directory <path_to_save_CoNLL12_coreference_resolution_directory>. <language> can be english, arabic or chinese. In this paper, we set <language> to english. If you want to use Python 3, please refer to the guideline
Generate TFRecord files for experiments. Run the command with Python 3 bash ./scripts/data/generate_tfrecord_dataset.sh <path_to_save_CoNLL12_coreference_resolution_directory> <path_to_save_tfrecord_directory> <path_to_pretrain_mlm_vocab_file> and it will create {train/dev/test}.overlap.corefqa.{language}.tfrecord files in the directory <path_to_save_CoNLL12_coreference_resolution_directory>.

Download Pretrained MLM

In our experiments, we used pretrained mask language models to initialize the mention_proposal and corefqa models.

Download the pretrained models. Run bash ./scripts/data/download_pretrained_mlm.sh <path_to_save_pretrained_mlm> <model_sign> to download and unzip the pretrained mlm models. <model_sign> shoule take the value of [bert_base, bert_large, spanbert_base, spanbert_large, bert_tiny].

bert_base, bert_large, spanbert_base, spanbert_large are trained with a cased(upppercase and lowercase tokens) vocabulary. Should use the cased train/dev/test coreference datasets.
bert_tiny is trained with a uncased(lowercase tokens) vocabulary. We use the tinyBERT model for fast debugging. Should use the uncased train/dev/test coreference datasets.

Transform SpanBERT from Pytorch to Tensorflow.

After downloading bert_<scale> to <path_to_bert_<scale>_tf_dir> and spanbert_<scale> to <path_to_spanbert_<scale>_pytorch_dir>, you can start transforming the SpanBERT model to Tensorflow and the model is saved to the directory <path_to_save_spanbert_tf_checkpoint_dir>. <scale> should take the value of [base, large].

We need to tranform the SpanBERT checkpoints from Pytorch to TF because the offical relased models were trained with Pytorch. Run bash ./scripts/data/transform_ckpt_pytorch_to_tf.sh <model_name> <path_to_spanbert_<scale>_pytorch_dir> <path_to_bert_<scale>_tf_dir> <path_to_save_spanbert_tf_checkpoint_dir> and the <model_name> in TF will be saved in <path_to_save_spanbert_tf_checkpoint_dir>.

<model_name> should take the value of [spanbert_base, spanbert_large].
<scale> indicates that the bert_model.ckpt in the <path_to_bert_<scale>_tf_dir> should have the same scale(base, large) to the bert_model.bin in <path_to_spanbert_<scale>_pytorch_dir>.

Training

Follow the pipeline described in the paper, you need to:

load a pretrained SpanBERT model.
finetune the SpanBERT model on the combination of Squad and Quoref datasets.
pretrain the mention proposal model on the coref dataset.
jointly train the mention proposal model and the mention linking model.

Notice: We provide the options of both pretraining these models yourself and loading the our pretrained models for 2) and 3).

Finetune the SpanBERT Model on the Combination of Squad and Quoref Datasets

We finetune the SpanBERT model on the SQuAD 2.0 and Quoref QA tasks for data augmentation before the coreference resolution task.

You can directly download the pretrained model on the datasets. Download Data Augmentation Models on Squad and Quoref link Run ./scripts/data/download_squad2_finetune_model.sh <model-scale> <path-to-save-model> to download finetuned SpanBERT on SQuAD2.0. The <model-scale> should take the value of [base, large]. The <path-to-save-model> is the path to save finetuned spanbert on SQuAD2.0 datasets.
Or start to finetune the SpanBERT model on QA tasks yourself.

Download SQuAD 2.0 train and dev sets.
Download Quoref train and dev sets.
Finetune the SpanBERT model on Google Could V3-8 TPU.

For Squad 2.0, Run the script in ./script/model/squad_tpu.sh


 REPO_PATH=/home/shannon/coref-tf
 export TPU_NAME=tf-tpu
 export PYTHONPATH="$PYTHONPATH:$REPO_PATH"
 SQUAD_DIR=gs://qa_tasks/squad2
 BERT_DIR=gs://pretrained_mlm_checkpoint/spanbert_large_tf
 OUTPUT_DIR=gs://corefqa_output_squad/spanbert_large_squad2_2e-5  

 python3 ${REPO_PATH}/run/run_squad.py \
 --vocab_file=$BERT_DIR/vocab.txt \
 --bert_config_file=$BERT_DIR/bert_config.json \
 --init_checkpoint=$BERT_DIR/bert_model.ckpt \
 --do_train=True \
 --train_file=$SQUAD_DIR/train-v2.0.json \
 --do_predict=True \
 --predict_file=$SQUAD_DIR/dev-v2.0.json \
 --train_batch_size=8 \
 --learning_rate=2e-5 \
 --num_train_epochs=4.0 \
 --max_seq_length=384 \
 --do_lower_case=False \
 --doc_stride=128 \
 --output_dir=${OUTPUT_DIR} \
 --use_tpu=True \
 --tpu_name=$TPU_NAME \
 --version_2_with_negative=True

After getting the best model (choose based on the performance on dev set) on SQuAD2.0, you should start finetuning the saved model on Quoref.

Run the script in ./script/model/quoref_tpu.sh


 REPO_PATH=/home/shannon/coref-tf
 export TPU_NAME=tf-tpu
 export PYTHONPATH="$PYTHONPATH:$REPO_PATH"
 QUOREF_DIR=gs://qa_tasks/quoref
 BERT_DIR=gs://corefqa_output_squad/panbert_large_squad2_2e-5
 OUTPUT_DIR=gs://corefqa_output_quoref/spanbert_large_squad2_best_quoref_3e-5 

 python3 ${REPO_PATH}/run_quoref.py \
 --vocab_file=$BERT_DIR/vocab.txt \
 --bert_config_file=$BERT_DIR/bert_config.json \
 --init_checkpoint=$BERT_DIR/best_bert_model.ckpt \
 --do_train=True \
 --train_file=$QUOREF_DIR/quoref-train-v0.1.json \
 --do_predict=True \
 --predict_file=$QUOREF_DIR/quoref-dev-v0.1.json \
 --train_batch_size=8 \
 --learning_rate=3e-5 \
 --num_train_epochs=5 \
 --max_seq_length=384 \
 --do_lower_case=False \
 --doc_stride=128 \
 --output_dir=${OUTPUT_DIR} \
 --use_tpu=True \
 --tpu_name=$TPU_NAME

We use the best model (cho

CorefQA

Install / Use

README