SpanNER
SpanNER: Named EntityRe-/Recognition as Span Prediction
Install / Use
/learn @neulab/SpanNERREADME
SpanNER: Named EntityRe-/Recognition as Span Prediction
Overview | Demo | Installation | Preprocessing | Prepare Models | Running | System Combination | Bib
This repository contains the code for our paper SpanNER: Named EntityRe-/Recognition as Span Prediction (ACL 2021).
The model designed in this work has been deployed into ExplainaBoard.
Overview
We investigate complementary advantages of systems based on different paradigms: span prediction model and sequence labeling framework. We then reveal that span prediction, simultaneously, can serve as a system combiner to re-recognize named entities from different systems’ outputs. We experimentally implement 154 systems on 11 datasets, covering three languages. Comprehensive results show the effectiveness of span prediction models that serve as base NER systems and system combiners.
<!-- Two roles of span prediction models (boxes in blue): * as a base NER system * as a system combiner. --> <div align="center"> <img src="pic/spanner.png" width = "550" alt="d" align=center /> </div>Demo
We deploy SpanNER into the ExplainaBoard.
<div align="center"> <img src="pic/demo.gif" align=center /> </div>Quick Installation
python3PyTorchpytorch-lightning
Run the following script to install the dependencies,
pip3 install -r requirements.txt
Data Preprocessing
The dataset needs to be preprocessed, before running the model.
We provide dataprocess/bio2spannerformat.py for reference, which gives the CoNLL-2003 as an example.
First, you need to download datasets, and then convert them into BIO2 tagging format. We provided the CoNLL-2003 dataset with BIO format in the data/conll03_bio folder and its preprocessed format dataset in the data/conll03 folder.
The download links of the datasets used in this work are shown as follows:
Prepare Models
For English Datasets, we use BERT-Large.
For Dutch and Spanish Datasets, we use BERT-Multilingual-Base.
How to Run?
Here, we give CoNLL-2003 as an example. You may need to change the DATA_DIR, PRETRAINED, dataname, and n_class to your own dataset path, pre-trained model path, dataset name, and the number of labels in the dataset, respectively.
./run_conll03_spanner.sh
System Combination
Base Model
We provided 12 base models (result-files) of the CoNLL-2003 dataset in combination/results.
More base models (result-files) can be downloaded from ExplainaBoard-download.
Combination
Put your different base models (result-files) in the data/results folder, then run:
python comb_voting.py
Here, we provided four system combination methods, including:
- SpanNER,
- Majority voting (VM),
- Weighted voting based on overall F1-score (VOF1),
- Weighted voting based on class F1-score (VCF1).
Results at a Glance
<div align="center"> <img src="pic/comb_res.png" width = "600" alt="d" align=center /> </div>Bib
@article{fu2021spanner,
title={SpanNer: Named Entity Re-/Recognition as Span Prediction},
author={Fu, Jinlan and Huang, Xuanjing and Liu, Pengfei},
journal={arXiv preprint arXiv:2106.00641},
year={2021}
}
Related Skills
node-connect
350.1kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
109.9kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
350.1kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
350.1kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
