Few-Shot Document-Level Relation Extraction

Code and data for the NAACL 2022 paper Few-Shot Document-Level Relation Extraction

Accessing the data (FREDo benchmark tasks)
Dependencies
Reproducing Results (+ link to trained models)
Relevant Code per Section in Paper
Bibtex for Citing

Accessing the data

Download Sampled Episodes

If you are only interested in the raw data for the tasks, you can directly download sampled episodes here: https://drive.google.com/drive/folders/1PuJSJxqZP4ijxFSBZZ6Fmc0SgR2S8pYU?usp=sharing (test_episodes.zip [~120 MB DL, ~550MB on disk] + train_episodes_single.zip [~330MB DL, ~1.4GB on disk]+ train_episodes_schema.zip [~330MB DL, ~1.4GB on disk])

Training and dev/validation episodes are available in two different versions: "single" or "schema". For "single", each episode contains annotations for only a single relation type. For "schema", each episode contains annotations for multiple relation types sampled via the same procedure as the test episodes (see section 4.3 of paper).

In our paper and code we evaluate macro F1 scores across relation types.

Training and Development Data

data/[train, dev, test_docred, test_scierc].json contain all annotated documents used for training and testing.
data/*_indices.json contain sampled episodes (only the indices of the documents and which relations are to be annotated/extracted).

You can either use our pipeline or export all episodes as json files to use in your own pipeline.

Test episode sampling is called in train.py (Lines 72+73).

Export Episodes to JSON

If you are evaluating your own model, please use the test episodes as sampled by us. You can run export_test_episodes.py to export all episodes for use with your own pipeline.

Dependencies

Dependencies:

torch (1.8.0)
transformers (4.18.0)
tqdm (4.64.0)
wandb (0.12.16)

python3 -m venv venv
source venv/bin/activate
pip install wheel
pip install -U setuptools
pip install torch==1.8.0+cu111 -f https://download.pytorch.org/whl/torch_stable.html
pip install transformers tqdm wandb

Reproducing Results

In order to reproduce the results from the paper, you can download the models here: https://drive.google.com/drive/folders/1eulLgrGiOwSZawoOGytA6fJZ-b3Rcrob?usp=sharing

The following are the commands for reporducing the results in the paper. Click on "Expected output" for detailed results.

BASELINE 1-DOC

python train.py --support_docs_eval 1 --model dlmnav+sie+sbn --num_epochs 0 --use_markers False

<details> <summary>Expected output</summary>

3nX3120z6X
---- INDOMAIN TEST EVAL -----
type         precision      recall          f1    support   
P361              0.20        1.79        0.37    893       
P279              0.02        5.15        0.03    136       
P102              0.42       14.49        0.82    428       
P17               3.76       11.08        5.61    59312     
P35               0.29       12.61        0.56    222       
P495              0.41       10.08        0.79    1081      
P463              0.09        1.62        0.18    493       
P3373             0.18       16.43        0.36    645       
P118              0.13        8.30        0.25    253       
P39               0.03        2.63        0.07    38        
P140              0.12       13.97        0.24    458       
P272              0.07       10.00        0.13    160       
P674              0.28        7.76        0.54    348       
P25               0.16       19.28        0.31    83        
P364              0.03       16.92        0.06    65        
P1001             0.14       10.13        0.27    375       
P194              0.07        8.56        0.14    222       
P582              0.07        3.96        0.14    101       
-                    -           -           -    -         
macro             0.36        9.71        0.60
---- SCIERC TEST EVAL -----
type         precision      recall          f1    support   
USED-FOR          3.60        3.82        3.71    42142     
PART-OF           0.46        1.90        0.74    2529      
CONJUNCTION        1.92        5.24        2.81    6414      
EVALUATE-FOR        1.22        2.79        1.70    4223      
HYPONYM-OF        1.30        3.64        1.92    4482      
COMPARE           0.54        2.49        0.89    2328      
FEATURE-OF        0.33        1.30        0.53    2465      
-                    -           -           -    -         
macro             1.34        3.03        1.76

</details>

BASELINE 3-DOC

python train.py --support_docs_eval 3 --model dlmnav+sie+sbn --num_epochs 0 --use_markers False

<details> <summary>Expected output</summary>

G7BwlKpec6
---- INDOMAIN TEST EVAL -----
type         precision      recall          f1    support   
P17               5.19        6.11        5.61    33229     
P361              0.50        1.80        0.78    887       
P272              0.14       20.00        0.27    55        
P495              0.69        6.35        1.24    850       
P674              0.45        8.59        0.86    128       
P35               0.32        2.70        0.56    111       
P102              0.76       15.77        1.45    279       
P140              0.23        2.34        0.42    171       
P364              0.32       19.23        0.63    78        
P463              0.34        3.11        0.61    193       
P3373             0.37        7.88        0.71    241       
P194              0.21        8.62        0.41    116       
P1001             0.20        7.78        0.39    167       
P279              0.04        8.47        0.08    59        
P25               0.03        5.26        0.05    19        
P118              0.38       13.83        0.75    94        
P582              0.14        5.56        0.27    36        
P39               0.42       50.00        0.84    2         
-                    -           -           -    -         
macro             0.60       10.75        0.89
---- SCIERC TEST EVAL -----
type         precision      recall          f1    support   
PART-OF           0.61        1.65        0.89    1335      
USED-FOR          3.98        3.02        3.44    14548     
HYPONYM-OF        2.16        1.56        1.81    2311      
CONJUNCTION        3.07        4.53        3.66    3332      
COMPARE           1.09        3.36        1.64    1162      
EVALUATE-FOR        1.29        1.89        1.53    2382      
FEATURE-OF        0.68        1.21        0.88    1235      
-                    -           -           -    -         
macro             1.84        2.46        1.98

</details>

DLMNAV 1-DOC

python train.py --support_docs_eval 1 --model dlmnav --num_epochs 0 --load_checkpoint checkpoints/best_dlmnav.pt

<details> <summary>Expected output</summary>

38BE1Nj8Pu
loading model from checkpoints/best_dlmnav.pt
---- INDOMAIN TEST EVAL -----
type         precision      recall          f1    support   
P279              0.37        2.94        0.66    136       
P17              30.40        2.38        4.41    59312     
P463              3.82        8.11        5.19    493       
P495             11.43       16.47       13.50    1081      
P674              4.45       23.56        7.49    348       
P361              3.33        5.04        4.01    893       
P364              1.76       36.92        3.36    65        
P39               0.27       15.79        0.52    38        
P35               5.06       27.48        8.54    222       
P3373             7.12       21.24       10.67    645       
P118             15.54       62.06       24.86    253       
P1001             1.93        6.40        2.96    375       
P194              3.90       30.63        6.92    222       
P25               2.03       45.78        3.89    83        
P102             10.61       14.72       12.33    428       
P272              4.47       47.50        8.17    160       
P582              3.13       27.72        5.62    101       
P140              9.24       10.04        9.62    458       
-                    -           -           -    -         
macro             6.60       22.49        7.37
---- SCIERC TEST EVAL -----
type         precision      recall          f1    support   
USED-FOR          2.54        0.04        0.08    42142     
PART-OF           0.59        0.36        0.44    2529      
FEATURE-OF        0.32        0.16        0.22    2465      
CONJUNCTION        1.06        0.25        0.40    6414      
EVALUATE-FOR        0.91        0.19        0.31    4223      
HYPONYM-OF        6.25        2.57        3.64    4482      
COMPARE           1.31        0.69        0.90    2328      
-                    -           -           -    -         
macro             1.85        0.61        0.86

</details>

DLMNAV 3-DOC

python train.py --support_docs_eval 3 --model dlmnav --num_epochs 0 --load_checkpoint checkpoints/best_dlmnav.pt

<details> <summary>Expected output</summary>

eASm3VD2Bv
loading model from checkpoints/best_dlmnav.pt
---- INDOMAIN TEST EVAL -----
type         precision      recall          f1    support   
P17              32.74        1.66        3.17    33229     
P361              4.51        4.51        4.51    887       
P674              5.35       31.25        9.14    128       
P495             11.91       14.94       13.26    850       
P140              7.08        8.77        7.83    171       
P39               0.00        0.00        0.00    2         
P463              3.45        6.22        4.44    193       
P194              6.65       31.03       10.96    116       
P35               4.64       27.03        7.93    111       
P364             10.21       62.82       17.56    78        
P272              4.

FREDo

Install / Use

README

Few-Shot Document-Level Relation Extraction

Table of Contents

Accessing the data

Download Sampled Episodes

Training and Development Data

Export Episodes to JSON

Dependencies

Reproducing Results

BASELINE 1-DOC

BASELINE 3-DOC

DLMNAV 1-DOC

DLMNAV 3-DOC