ASP

PyTorch implementation and pre-trained models for ASP - Autoregressive Structured Prediction with Language Models, EMNLP 22. https://arxiv.org/pdf/2210.14698.pdf

Generate Convert Improve

Install / Use

/learn @lyutyuh/ASP

About this skill

Quality Score

0/100

README

Autoregressive Structured Prediction with Language Models

This repository contains PyTorch implementation and pre-trained models for ASP, described in Autoregressive Structured Prediction with Language Models.

<p align="right" width="100%"> Links: <a href="https://github.com/eth-nlped">ETH-NLPED lab</a> , <a href="https://github.com/rycolab">Rycolab</a> </p>

Setup
Tasks
Pre-trained Models
Citation

Setup

1. Clone this repo:

git clone https://github.com/lyutyuh/ASP.git
cd ASP
export ASP=$PWD # setting environment variable

2. Prepare the environment

2.1 Create virtual environment with:

python -m venv <path_to_venv>/asp    # create a new environment (asp)
source <path_to_venv>/asp/bin/activate
pip install -r requirements.txt

</details> or <details> <summary> <code>conda</code> </summary>

conda env create -f environment.yml    # create a new environment (asp)

</details>

Download and preprocess the datasets

<details> <summary> <code> named entity recognition </code> </summary>

CoNLL-03

  wget https://polybox.ethz.ch/index.php/s/bFf8vJBonIT7sr8/download -O ./data/conll03_ner.zip
  unzip ./data/conll03_ner.zip -d ./data
  rm ./data/conll03_ner.zip
  python ./data/conll03_ner/conll03_to_json.py
  python ./data/t5minimize_ner.py ./data/conll03_ner ./data/conll03_ner

OntoNotes V5

Coming soon!

</details> <details> <summary> <code> end-to-end relation extraction </code> </summary>

CoNLL-04

  wget https://polybox.ethz.ch/index.php/s/Lk44AwhOeDSeZTh/download -O ./data/conll04_ere.zip
  unzip ./data/conll04_ere.zip -d ./data
  rm ./data/conll04_ere.zip
  python ./data/t5minimize_ere.py ./data/conll04_ere/ ./data/conll04_ere

ACE-05

ACE-05 is not a publically available dataset. Please follow https://github.com/luanyi/DyGIE/tree/master/preprocessing to obtain the dataset json files {train,dev,test}.json and copy them to ./data/ace05_ere/.

Then:

  python ./data/ace05_ere/ace05_to_json.py
  python ./data/t5minimize_ere.py ./data/ace05_ere ./data/ace05_ere

</details> <details> <summary> <code> coreference resolution </code> </summary>

CoNLL-12 (OntoNotes)

OntoNotes is not a publically available dataset. Please follow http://conll.cemantix.org/2012/data.html and https://catalog.ldc.upenn.edu/LDC2013T19 to obtain the files {train,dev,test}.english.v4_gold_conll and copy them to ./data/ontonotes_coref/.

Then:

python ./data/t5minimize_coref.py ./data/ontonotes_coref/ ./data/ontonotes_coref/

</details>

Tasks

For task in {ner,ere,coref}:

  python run_{task}.py <config_name> 0

Please find the <config_name> in each {ner,ere,coref}.conf file under configs

Running on New Datasets

1. prepare the data

For named entity recognition and relation extraction, convert the new dataset to <newdataset>_{train,dev,test}.json in the following format:

[{
    "tokens": ["John", "Wilkes", "Booth", ",", "who", "assassinated", "President", "Lincoln", ",", "was", "an", "actor", "."], 
    "entities": [{"type": "Peop", "start": 0, "end": 3}, {"type": "Peop", "start": 6, "end": 8}], 
    "relations": [{"type": "Kill", "head": 0, "tail": 1}] # Not necessary for NER
}, ...]

and <newdataset>_types.json:

{
    "entities": {
        "Loc": {"short": "Loc", "verbose": "Location"}, 
        "Org": {"short": "Org", "verbose": "Organization"}, 
        "Peop": {"short": "Peop", "verbose":"People"}, 
        "Other": {"short": "Other", "verbose": "Other"}
    }, 
    "relations": { # Not necessary for NER
        "Work_For": {"short": "Work", "verbose": "Work for", "symmetric": false}, 
        "Kill": {"short": "Kill", "verbose": "Kill", "symmetric": false}, 
        "OrgBased_In": {"short": "OrgBI", "verbose": "Organization based in", "symmetric": false}, 
        "Live_In": {"short": "Live", "verbose": "Live in", "symmetric": false}, 
        "Located_In": {"short": "LocIn", "verbose": "Located in", "symmetric": false}
    }
}

and run

  python ./data/t5minimize_ere.py ./data/<newdataset>/ ./data/<newdataset>/

For coreference resolution, convert the new dataset to CoNLL-12 format. Then

python ./data/t5minimize_coref.py ./data/<newdataset>/ ./data/<newdataset>/

2. Prepare the configuration

Add a new entry in the corresponding .conf file under configs with the directory to the new dataset data_dir = ${ASP}/data/<newdataset>/

Pre-trained models

Use the following command to load the pre-trained model and evaluate on the corresponding task. config_name refers to the experiment name given in the .conf file under configs.

python evaluate_<task>.py <config_name> <checkpoint_name> <gpu_id>

1. Coreference resolution

| config_name | checkpoint_name | dataset | link | params | | ----------- | --------------- | ---- | ---- | ------ | | flant5_base | tliu/asp-coref-flan-t5-base | CoNLL-2012 (OntoNotes) | link | 220 M | | flant5_large | tliu/asp-coref-flan-t5-large | CoNLL-2012 (OntoNotes) | link | 770 M | | flant5_xl | tliu/asp-coref-flan-t5-xl | CoNLL-2012 (OntoNotes) | link | 3 B | | t0_3b | tliu/asp-coref-t0-3b | CoNLL-2012 (OntoNotes) | link | 3 B |

2. Named entity recognition (NER)

| config_name | checkpoint_name | dataset | link | params | | ----------- | --------------- | ---- | --------- | ------ | | flant5_base | tliu/asp-ner-flan-t5-base | CoNLL-03 NER | link | 220 M | | flant5_large | tliu/asp-ner-flan-t5-large | CoNLL-03 NER | link | 770 M |

3. End-to-end relation extraction (ERE)

| config_name | checkpoint_name | dataset | link | params | | ----------- | --------------- | ---- | --------- | ------ | | flant5_base_conll04 | tliu/asp-re-flan-t5-base | CoNLL-04 RE | link | 220 M | | flant5_large_conll04 | tliu/asp-re-flan-t5-large | CoNLL-04 RE | link | 770 M | | flant5_xl_conll04 | tliu/asp-re-flan-t5-xl | CoNLL-04 RE | link | 3 B |

Citation

@inproceedings{liu-etal-2022-autoregressive,
    title={Autoregressive Structured Prediction with Language Models},
    author={Tianyu Liu and Yuchen Jiang and Nicholas Monath and Ryan Cotterell and Mrinmaya Sachan},
    year={2022},
    url={https://arxiv.org/abs/2210.14698},
    eprint={2210.14698},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}

Related Skills

node-connect

347.9k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

108.7k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

347.9k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

qqbot-media

347.9k

QQBot 富媒体收发能力。使用 <qqmedia> 标签，系统根据文件扩展名自动识别类型（图片/语音/视频/文件）。

lyutyuh

View profile

View on GitHub

GitHub Stars106

CategoryDevelopment

Updated14d ago

Forks13

lyutyuh/ASP

Languages

Python

Security Score

100/100

Audited on Mar 20, 2026

No findings

ASP

Install / Use

README

Autoregressive Structured Prediction with Language Models

Contents

Setup

1. Clone this repo:

2. Prepare the environment

2.1 Create virtual environment with:

Download and preprocess the datasets

CoNLL-03

OntoNotes V5

CoNLL-04

ACE-05

CoNLL-12 (OntoNotes)

Tasks

Running on New Datasets

1. prepare the data

2. Prepare the configuration

Pre-trained models

1. Coreference resolution

2. Named entity recognition (NER)

3. End-to-end relation extraction (ERE)

Citation

Related Skills