BertWithPretrained
An implementation of the BERT model and its related downstream tasks based on the PyTorch framework. @月来客栈
Install / Use
/learn @mlwithme/BertWithPretrainedREADME
BertWithPretrained
This project is an implementation of the BERT model and its related downstream tasks based on the PyTorch framework. It also includes a detailed explanation of the BERT model and the principles of each underlying task.
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Before learning to use this project, you need to know the relevant principles of Transformer by these three examples: Translation, Classification, Coupling Generation.
Implementations
- [x] 1. Implementing the BERT model from scratch
- [x] 2. Chinese text classification task based on BERT pretraining model
- [x] 3. English text implication (MNLI) task based on BERT pretrained model
- [x] 4. English Multiple Choice (SWAG) task based on BERT pretrained model
- [x] 5. English Question Answering (SQuAD) Task Based on BERT pretrained Model
- [x] 6. Training BERT tasks from scratch based on NSL and MLM tasks
- [x] 7. Named Entity Recognition task based on BERT pretraining model
Project Structure
-
bert_base_chinesecontains the bert_base_chinese pre-training model and configuration files -
bert_base_uncased_englishcontains the bert_base_uncased_english pre-training model and configuration files -
datacontains all datasets used by each downstream task.SingleSentenceClassificationis a 15-classes Chinese classification dataset of Toutiao.PairSentenceClassificationis the dataset of MNLI (The Multi-Genre Natural Language Inference Corpus).MultipeChoiceis the dataset of SWAG.SQuADis the dataset of SQuAD-V1.1.WikiTextis the Wikipedia English corpus for pre-training.SongCiis SongCi data for Chinese model pre-trainingChineseNERis a dataset used to train Chinese Named Entity Recognition.
-
modelis the implementation of each moduleBasicBertcontains basic BERT implementationMyTransformer.pyself-attention implementation.BertEmbedding.pyInput Embedding implementation.BertConfig.pyused to import configuration ofconfig.json.Bert.pyimplementation of bert.
DownstreamTaskscontains all downstream tasks implementationBertForSentenceClassification.pysentence(s) classification implementation.BertForMultipleChoice.pymultiple choice implementation.BertForQuestionAnswering.pyquestion answer (text span) implementation.BertForNSPAndMLM.pyNSP and MLM implementation.BertForTokenClassification.pytoken classification implementation.
-
Taskimplementation of training and inference for each downstream taskTaskForSingleSentenceClassification.pytaks of single sentence classification implementation such as sentence classification.TaskForPairSentence.pytask of pair sentence classification implementation such as MNLI.TaskForMultipleChoice.pytask of multiple choice implementation such as SWAG.TaskForSQuADQuestionAnswering.pytask os question answering (text span) implementation such as SQuAD.TaskForPretraining.pytasks of NSP ans MLM implementation.TaskForChineseNER.pytask of Chinese Named Entity Recognition implementation.
-
testtest cases of each downstream task. -
utilsdata_helpers.pyis the data preprocessing and dataset building module of each downstream task;log_helper.pyis the log printing module.creat_pretraining_data.pyused to construct the dataset of BERT pre-training task.
Python Environment
Python 3.6 and packages version
torch==1.5.0
torchtext==0.6.0
torchvision==0.6.0
transformers==4.5.1
numpy==1.19.5
pandas==1.1.5
scikit-learn==0.24.0
tqdm==4.61.0
Usage
Step 1. Download Dataset
Downloading each dataset and the corresponding BERT pretrained model (if empty) and putting it in the corresponding directory. For details, see the README.md file in each data (data) directory.
Step 2. Runing
Going to the Tasks directory and run the model.
2.1 Chinese text classification task
Model structure and data processing:
<img src="imgs/21102512313.jpg" width="45%">python TaskForSingleSentenceClassification.py
Result:
-- INFO: Epoch: 0, Batch[0/4186], Train loss :2.862, Train acc: 0.125
-- INFO: Epoch: 0, Batch[10/4186], Train loss :2.084, Train acc: 0.562
-- INFO: Epoch: 0, Batch[20/4186], Train loss :1.136, Train acc: 0.812
-- INFO: Epoch: 0, Batch[30/4186], Train loss :1.000, Train acc: 0.734
...
-- INFO: Epoch: 0, Batch[4180/4186], Train loss :0.418, Train acc: 0.875
-- INFO: Epoch: 0, Train loss: 0.481, Epoch time = 1123.244s
...
-- INFO: Epoch: 9, Batch[4180/4186], Train loss :0.102, Train acc: 0.984
-- INFO: Epoch: 9, Train loss: 0.100, Epoch time = 1130.071s
-- INFO: Accurcay on val 0.884
-- INFO: Accurcay on val 0.888
2.2 Text Implication
Model structure and data processing:
<img src="imgs/21103032538.jpg" width="45%">python TaskForPairSentenceClassification.py
Result:
-- INFO: Epoch: 0, Batch[0/17181], Train loss :1.082, Train acc: 0.438
-- INFO: Epoch: 0, Batch[10/17181], Train loss :1.104, Train acc: 0.438
-- INFO: Epoch: 0, Batch[20/17181], Train loss :1.129, Train acc: 0.250
-- INFO: Epoch: 0, Batch[30/17181], Train loss :1.063, Train acc: 0.375
...
-- INFO: Epoch: 0, Batch[17180/17181], Train loss :0.367, Train acc: 0.909
-- INFO: Epoch: 0, Train loss: 0.589, Epoch time = 2610.604s
...
-- INFO: Epoch: 9, Batch[0/17181], Train loss :0.064, Train acc: 1.000
-- INFO: Epoch: 9, Train loss: 0.142, Epoch time = 2542.781s
-- INFO: Accurcay on val 0.827
-- INFO: Accurcay on val 0.830
2.3 Multiple Choice (SWAG) Task
Model structure and data processing:
<img src="imgs/21110834330.jpg" width="50%"> <img src="imgs/21110819453.jpg" width="50%"> <img src="imgs/21110839843.jpg" width="50%">python TaskForMultipleChoice.py
Result:
[2021-11-11 21:32:50] - INFO: Epoch: 0, Batch[0/4597], Train loss :1.433, Train acc: 0.250
[2021-11-11 21:32:58] - INFO: Epoch: 0, Batch[10/4597], Train loss :1.277, Train acc: 0.438
[2021-11-11 21:33:01] - INFO: Epoch: 0, Batch[20/4597], Train loss :1.249, Train acc: 0.438
......
[2021-11-11 21:58:34] - INFO: Epoch: 0, Batch[4590/4597], Train loss :0.489, Train acc: 0.875
[2021-11-11 21:58:36] - INFO: Epoch: 0, Batch loss :0.786, Epoch time = 1546.173s
[2021-11-11 21:28:55] - INFO: Epoch: 0, Batch[0/4597], Train loss :1.433, Train acc: 0.250
[2021-11-11 21:30:52] - INFO: He is throwing darts at a wall. A woman, squats alongside flies side to side with his gun. ## False
[2021-11-11 21:30:52] - INFO: He is throwing darts at a wall. A woman, throws a dart at a dartboard. ## False
[2021-11-11 21:30:52] - INFO: He is throwing darts at a wall. A woman, collapses and falls to the floor. ## False
[2021-11-11 21:30:52] - INFO: He is throwing darts at a wall. A woman, is standing next to him. ## True
[2021-11-11 21:30:52] - INFO: Accuracy on val 0.794
2.4 Question Answering (SQuAD) Task
Model structure and data processing:
<img src="imgs/22010353470.jpg" width="50%"> <img src="imgs/22010226560.jpg" width="50%"> <img src="imgs/22010215402.jpg" width="50%"> <img src="imgs/22010251228.jpg" width="50%">python TaskForSQuADQuestionAnswering.py
Result:
[2022-01-02 14:42:17]缓存文件 ~/BertWithPretrained/data/SQuAD/dev-v1_128_384_64.pt 不存在,重新处理并缓存!
[2022-01-02 14:42:17] - DEBUG: <<<<<<<< 进入新的example >>>>>>>>>
[2022-01-02 14:42:17] - DEBUG: ## 正在预处理数据 utils.data_helpers is_training = False
[2022-01-02 14:42:17] - DEBUG: ## 问题 id: 56be5333acb8001400a5030d
[2022-01-02 14:42:17] - DEBUG: ## 原始问题 text: Which performers joined the headliner during the Super Bowl 50 halftime show?
[2022-01-02 14:42:17] - DEBUG: ## 原始描述 text: CBS broadcast Super Bowl 50 in the U.S., and charged an average of $5 million for a ....
[2022-01-02 14:42:17]- DEBUG: ## 上下文长度为:87, 剩余长度 rest_len 为 : 367
[2022-01-02 14:42:17] - DEBUG: ## input_tokens: ['[CLS]', 'which', 'performers', 'joined', 'the', 'headline', '##r', 'during', 'the', ...]
[2022-01-02 14:42:17] - DEBUG: ## input_ids:[101, 2029, 9567, 2587, 1996, 17653, 2099, 2076, 1996, 3565, 4605, 2753, 22589, 2265, 1029, 102, 6568, ....]
[2022-01-02 14:42:17] - DEBUG: ## segment ids:[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...]
[2022-01-02 14:42:17] - DEBUG: ## orig_map:{16: 0, 17: 1, 18: 2, 19: 3, 20: 4, 21: 5, 22: 6, 23: 7, 24: 7, 25: 7, 26: 7, 27: 7, 28: 8, 29: 9, 30: 10,....}
[2022-01-02 14:42:17] - DEBUG: ======================
....
[2022-01-02 15:13:50] - INFO: Epoch:0, Batch[810/7387] Train loss: 0.998, Train acc: 0.708
[2022-01-02 15:13:55] - INFO: Epoch:0, Batch[820/7387] Train loss: 1.130, Train acc: 0.708
[2022-01-02 15:13:59] - INFO: Epoch:0, Batch[830/7387] Train loss: 1.960, Train acc: 0.375
[2022-01-02 15:14:04] - INFO: Epoch:0, Batch[840/7387] Train loss: 1.933, Train acc: 0.542
......
[2022-01-02 15:15:27] - INFO: ### Quesiotn: [CLS] when was the first university in switzerland founded..
[2022-01-02 15:15:27] - INFO: ## Predicted answer: 1460
[2022-01-02 15:15:27] - INFO: ## True answer: 1460
[2022-01-02 15:15:27] - INFO: ## True answer idx: (tensor(46, tensor(47))
[2022-01-02 15:15:27] - I
Related Skills
YC-Killer
2.7kA library of enterprise-grade AI agents designed to democratize artificial intelligence and provide free, open-source alternatives to overvalued Y Combinator startups. If you are excited about democratizing AI access & AI agents, please star ⭐️ this repository and use the link in the readme to join our open source AI research team.
best-practices-researcher
The most comprehensive Claude Code skills registry | Web Search: https://skills-registry-web.vercel.app
mentoring-juniors
Community-contributed instructions, agents, skills, and configurations to help you make the most of GitHub Copilot.
groundhog
399Groundhog's primary purpose is to teach people how Cursor and all these other coding agents work under the hood. If you understand how these coding assistants work from first principles, then you can drive these tools harder (or perhaps make your own!).
