LEBERT
Code for the ACL2021 paper "Lexicon Enhanced Chinese Sequence Labelling Using BERT Adapter"
Install / Use
/learn @liuwei1206/LEBERTREADME
Lexicon Enhanced Chinese Sequence Labeling Using BERT Adapter
Code and checkpoints for the ACL2021 paper "Lexicon Enhanced Chinese Sequence Labelling Using BERT Adapter"
Arxiv link of the paper: https://arxiv.org/abs/2105.07148
If any questions, please contact the email: willie1206@163.com
Requirement
- Python 3.7.0
- Transformer 3.4.0
- Numpy 1.18.5
- Packaging 17.1
- skicit-learn 0.23.2
- torch 1.6.0+cu92
- tqdm 4.50.2
- multiprocess 0.70.10
- tensorflow 2.3.1
- tensorboardX 2.1
- seqeval 1.2.1
Input Format
CoNLL format (prefer BIOES tag scheme), with each character its label for one line. Sentences are splited with a null line.
美 B-LOC
国 E-LOC
的 O
华 B-PER
莱 I-PER
士 E-PER
我 O
跟 O
他 O
谈 O
笑 O
风 O
生 O
Chinese BERT,Chinese Word Embedding, and Checkpoints
Chinese BERT
Chinese BERT: https://huggingface.co/bert-base-chinese/tree/main <!--https://cdn.huggingface.co/bert-base-chinese-pytorch_model.bin-->
Chinese word embedding:
~~Word Embedding: https://ai.tencent.com/ailab/nlp/en/data/Tencent_AILab_ChineseEmbedding.tar.gz~~
The original download link does not work. We update it as:
Word Embedding: https://ai.tencent.com/ailab/nlp/en/data/tencent-ailab-embedding-zh-d200-v0.2.0.tar.gz
More info refers to: Tencent AI Lab Word Embedding
Checkpoints and Shells
- Weibo NER
- Ontonote4 NER
- MSRA NER
- Resume NER
- CTB5 POS
- CTB6 POS
- UD1 POS
- UD2 POS
- CTB6 CWS
- MSR CWS
- PKU CWS
Directory Structure of data
- berts
- bert
- config.json
- vocab.txt
- pytorch_model.bin
- bert
- dataset, you can download from here <!--[here](https://drive.google.com/file/d/1jeZu6vczASCaClmC6pLO_o7NOHm5_TVD/view?usp=sharing) -->
- NER
- note4
- msra
- resume
- POS
- ctb5
- ctb6
- ud1
- ud2
- CWS
- ctb6
- msr
- pku
- NER
- vocab
- tencent_vocab.txt, the vocab of pre-trained word embedding table, downlaod from here.
- embedding
- word_embedding.txt
- result
- NER
- note4
- msra
- resume
- POS
- ctb5
- ctb6
- ud1
- ud2
- CWS
- ctb6
- msr
- pku
- NER
- log
Run
-
1.Convert .char.bmes file to .json file,
python3 to_json.py -
2.run the shell,
sh run_demo.sh
If you want to load my checkpoints, you need to make some revisions to your transformers.
My model is trained in distribution mode so it can not be directly loaded by single-GPU mode. You can follow the below steps to revise the transformers before load my checkpoints.
-
Enter the source code director of Transformer,
cd source/transformers-master -
Find the modeling_util.py, and positioned to about 995 lines
-
change the code as follows:

-
Compile the revised source code and install.
python3 setup.py install
Cite
@inproceedings{liu-etal-2021-lexicon,
title = "Lexicon Enhanced {C}hinese Sequence Labeling Using {BERT} Adapter",
author = "Liu, Wei and
Fu, Xiyan and
Zhang, Yue and
Xiao, Wenming",
booktitle = "Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)",
month = aug,
year = "2021",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2021.acl-long.454",
doi = "10.18653/v1/2021.acl-long.454",
pages = "5847--5858"
}
Related Skills
node-connect
352.0kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
111.1kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
352.0kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
352.0kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
