BRIO
ACL 2022: BRIO: Bringing Order to Abstractive Summarization
Install / Use
/learn @yixinL7/BRIOREADME
BRIO: Bringing Order to Abstractive Summarization
This repo contains the code, data and trained models for our paper BRIO: Bringing Order to Abstractive Summarization.
Quick Links
- Overview
- How to Install
- Description of Codes
- Preprocessing
- How to Run
- Results, Outputs, Checkpoints
- Use BRIO with Huggingface
Overview
We present a novel training paradigm for neural abstractive summarization. Instead of using MLE training alone, we introduce a contrastive learning component, which encourages the abstractive models to estimate the probability of system-generated summaries more accurately.
<div align="center"> <img src="model.png" width = "550" alt="d" align=center /> </div>How to Install
python3.8conda create --name env --file spec-file.txt- Further steps
- install additional libraries (after activating the conda env)
pip install -r requirements.txt compare_mt-> https://github.com/neulab/compare-mtgit clone https://github.com/neulab/compare-mt.git cd ./compare-mt pip install -r requirements.txt python setup.py install
- install additional libraries (after activating the conda env)
Our code is based on Huggingface's Transformers library.
Description of Codes
cal_rouge.py-> ROUGE calculationconfig.py-> model configurationdata_utils.py-> dataloaderlabel_smoothing_loss.py-> label smoothing lossmain.py-> training and evaluation proceduremodel.py-> modelsmodeling_bart.py,modeling_pegasus.py-> modified from Transformers library to support more efficient trainingpreprocess.py-> data preprocessingutils.py-> utility functionsgen_candidate.py-> generate candidate summaries
Workspace
Following directories should be created for our experiments.
./cache-> storing model checkpoints./result-> storing evaluation results
Preprocessing
We use the following datasets for our experiments.
- CNN/DailyMail -> https://github.com/abisee/cnn-dailymail
- XSum -> https://github.com/EdinburghNLP/XSum
- NYT -> https://catalog.ldc.upenn.edu/LDC2008T19
Preprocessed Data
You can download the preprocessed data for our experiments on CNNDM, CNNDM (cased) and XSum.
After donwloading, you should unzip the zip files in this root directory.
For NYT, you will need to get the license and please follow https://github.com/kedz/summarization-datasets for pre-processing.
Generate Candidate Summaries
To generate the candidate summaries from a pre-trained model, please run
python gen_candidate.py --gpuid [gpuid] --src_dir [path of the input file (e.g. test.source)] --tgt_dir [path of the output file] --dataset [cnndm/xsum]
Preprocess Your Own Data
For data preprocessing, please run
python preprocess.py --src_dir [path of the raw data] --tgt_dir [output path] --split [train/val/test] --cand_num [number of candidate summaries] --dataset [cnndm/xsum/nyt] -l [lowercase if the flag is set]
src_dir should contain the following files (using test split as an example):
test.sourcetest.source.tokenizedtest.targettest.target.tokenizedtest.outtest.out.tokenized
Each line of these files should contain a sample except for test.out and test.out.tokenized. In particular, you should put the candidate summaries for one data sample at neighboring lines in test.out and test.out.tokenized.
Notes: after the data preprocessing, you should also put the raw file test.source, test.target into the created data folder (e.g. ./cnndm/diverse/test.source)
We use the PTB tokenizer provided by Standford CoreNLP (download here). Please note that tokenized texts are only used for evaluation. To tokenize a file, you may run (using test.source as an example)
export CLASSPATH=/your_path/stanford-corenlp-3.8.0.jar
cat test.source | java edu.stanford.nlp.process.PTBTokenizer -ioFileList -preserveLines > test.source.tokenized
We have provided the examples files in ./examples/raw_data.
The preprocessing procedure will store the processed data as seperate json files in tgt_dir.
Example: preprocessing test set on CNNDM
# starting from the root directory
# create folders
mkdir ./cnndm
mkdir ./cnndm/diverse
mkdir ./cnndm/diverse/test
# suppose that the raw files are at ./raw_data, the results will be saved at ./cnndm/diverse/test
# please remember to put the source file and the target file on test set into the folder, e.g. ./cnndm/diverse/test.source
python preprocess.py --src_dir ./raw_data --tgt_dir ./cnndm/diverse --split test --cand_num 16 --dataset cnndm -l
How to Run
Hyper-parameter Setting
You may specify the hyper-parameters in main.py.
We also provide the specific settings on CNNDM (NYT share the same setting) and XSum in config.py.
Train
python main.py --cuda --gpuid [list of gpuid] --config [name of the config (cnndm/xsum)] -l
The checkpoints and log will be saved in a subfolder of ./cache.
Example: training on CNNDM
python main.py --cuda --gpuid 0 1 2 3 --config cnndm -l
Finetuning from an existing checkpoint
python main.py --cuda --gpuid [list of gpuid] -l --config [name of the config (cnndm/xsum)] --model_pt [model path]
model path should be a subdirectory in the ./cache directory, e.g. cnndm/model.pt (it shouldn't contain the prefix ./cache/).
Evaluate
For ROUGE calculation, we use the standard ROUGE Perl package from here in our paper. We lowercased and tokenized (using PTB Tokenizer) texts before calculating the ROUGE scores. Please note that the scores calculated by this package would be sightly different from the ROUGE scores calculated/reported during training/intermidiate stage of evalution, because we use a pure python-based ROUGE implementation to calculate those scores for better efficiency.
If you encounter problems when setting up the ROUGE Perl package (unfortunately it happens a lot :( ), you may consider using pure Python-based ROUGE package such as the one we used from the compare-mt package.
We provide the evaluation script in cal_rouge.py. If you are going to use Perl ROUGE package, please change line 13 into the path of your perl ROUGE package.
_ROUGE_PATH = '/YOUR-ABSOLUTE-PATH/ROUGE-RELEASE-1.5.5/'
To evaluate the model performance, please first use the following command to generate the summaries.
python main.py --cuda --gpuid [single gpu] --config [name of the config (cnndm/xsum)] -e --model_pt [model path] -g [evaluate the model as a generator] -r [evaluate the model as a scorer/reranker]
model path should be a subdirectory in the ./cache directory, e.g. cnndm/model.pt (it shouldn't contain the prefix ./cache/).
The output will be saved in a subfolder of ./result having the same name of the checkpoint folder.
Example: evaluating the model as a generator on CNNDM
# write the system-generated files to a file: ./result/cnndm/test.out
python main.py --cuda --gpuid 0 --config cnndm -e --model_pt cnndm/model_generation.bin -g
# tokenize the output file -> ./result/cnndm/test.out.tokenized (you may use other tokenizers)
export CLASSPATH=/your_path/stanford-corenlp-3.8.0.jar
cat ./result/cnndm/test.out | java edu.stanford.nlp.process.PTBTokenizer -ioFileList -preserveLines > ./result/cnndm/test.out.tokenized
# calculate the ROUGE scores using ROUGE Perl Package
python cal_rouge.py --ref ./cnndm/test.target.tokenized --hyp ./result/cnndm/test.out.tokenized -l
# calculate the ROUGE scores using ROUGE Python Implementation
python cal_rouge.py --ref ./cnndm/test.target.tokenized --hyp ./result/cnndm/test.out.tokenized -l -p
Example: evaluating the model as a scorer on CNNDM
# rerank the candidate summaries
python main.py --cuda --gpuid 0 --config cnndm -e --model_pt cnndm/model_ranking.bin -r
# calculate the ROUGE scores using ROUGE Perl Package
# ./result/cnndm/reference and ./result/cnndm/candidate are two folders containing files. Each one of those files contain one summary
python cal_rouge.py --ref ./result/cnndm/reference --hyp ./result/cnndm/candidate -l
# calculate the ROUGE scores using ROUGE Python Implementation
# ./result/cnndm/reference and ./result/cnndm/candidate are two folders containing files. Each one of those files contain one summary
python cal_rouge.py --ref ./result/cnndm/reference --hyp ./result/cnndm/candidate -l -p
Results, Outputs, Checkpoints
The following are ROUGE scores calcualted by the standard ROUGE Perl package.
CNNDM
| | ROUGE-1 | ROUGE-2 | ROUGE-L | |----------|---------|---------|---------| | BART | 44.29 | 21.17 | 41.09 | | BRIO-Ctr | 47.28 | 22.93 | 44.15 | | BRIO-
