SkillAgentSearch skills...

BertWithPretrained

An implementation of the BERT model and its related downstream tasks based on the PyTorch framework. @月来客栈

Install / Use

/learn @mlwithme/BertWithPretrained
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

BertWithPretrained

[中文|English]

This project is an implementation of the BERT model and its related downstream tasks based on the PyTorch framework. It also includes a detailed explanation of the BERT model and the principles of each underlying task.

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Before learning to use this project, you need to know the relevant principles of Transformer by these three examples: Translation, Classification, Coupling Generation.

Implementations

Project Structure

  • bert_base_chinesecontains the bert_base_chinese pre-training model and configuration files

  • bert_base_uncased_englishcontains the bert_base_uncased_english pre-training model and configuration files

  • datacontains all datasets used by each downstream task.

    • SingleSentenceClassification is a 15-classes Chinese classification dataset of Toutiao.
    • PairSentenceClassification is the dataset of MNLI (The Multi-Genre Natural Language Inference Corpus).
    • MultipeChoice is the dataset of SWAG.
    • SQuAD is the dataset of SQuAD-V1.1.
    • WikiTextis the Wikipedia English corpus for pre-training.
    • SongCi is SongCi data for Chinese model pre-training
    • ChineseNER is a dataset used to train Chinese Named Entity Recognition.
  • model is the implementation of each module

    • BasicBert contains basic BERT implementation
      • MyTransformer.py self-attention implementation.
      • BertEmbedding.py Input Embedding implementation.
      • BertConfig.py used to import configuration of config.json.
      • Bert.py implementation of bert.
    • DownstreamTasks contains all downstream tasks implementation
      • BertForSentenceClassification.py sentence(s) classification implementation.
      • BertForMultipleChoice.py multiple choice implementation.
      • BertForQuestionAnswering.py question answer (text span) implementation.
      • BertForNSPAndMLM.py NSP and MLM implementation.
      • BertForTokenClassification.py token classification implementation.
  • Task implementation of training and inference for each downstream task

    • TaskForSingleSentenceClassification.py taks of single sentence classification implementation such as sentence classification.
    • TaskForPairSentence.py task of pair sentence classification implementation such as MNLI.
    • TaskForMultipleChoice.py task of multiple choice implementation such as SWAG.
    • TaskForSQuADQuestionAnswering.py task os question answering (text span) implementation such as SQuAD.
    • TaskForPretraining.py tasks of NSP ans MLM implementation.
    • TaskForChineseNER.py task of Chinese Named Entity Recognition implementation.
  • test test cases of each downstream task.

  • utils

    • data_helpers.py is the data preprocessing and dataset building module of each downstream task;
    • log_helper.py is the log printing module.
    • creat_pretraining_data.py used to construct the dataset of BERT pre-training task.

Python Environment

Python 3.6 and packages version

torch==1.5.0
torchtext==0.6.0
torchvision==0.6.0
transformers==4.5.1
numpy==1.19.5
pandas==1.1.5
scikit-learn==0.24.0
tqdm==4.61.0

Usage

Step 1. Download Dataset

Downloading each dataset and the corresponding BERT pretrained model (if empty) and putting it in the corresponding directory. For details, see the README.md file in each data (data) directory.

Step 2. Runing

Going to the Tasks directory and run the model.

2.1 Chinese text classification task

Model structure and data processing:

<img src="imgs/21102512313.jpg" width="45%">
python TaskForSingleSentenceClassification.py

Result:

-- INFO: Epoch: 0, Batch[0/4186], Train loss :2.862, Train acc: 0.125
-- INFO: Epoch: 0, Batch[10/4186], Train loss :2.084, Train acc: 0.562
-- INFO: Epoch: 0, Batch[20/4186], Train loss :1.136, Train acc: 0.812        
-- INFO: Epoch: 0, Batch[30/4186], Train loss :1.000, Train acc: 0.734
...
-- INFO: Epoch: 0, Batch[4180/4186], Train loss :0.418, Train acc: 0.875
-- INFO: Epoch: 0, Train loss: 0.481, Epoch time = 1123.244s
...
-- INFO: Epoch: 9, Batch[4180/4186], Train loss :0.102, Train acc: 0.984
-- INFO: Epoch: 9, Train loss: 0.100, Epoch time = 1130.071s
-- INFO: Accurcay on val 0.884
-- INFO: Accurcay on val 0.888

2.2 Text Implication

Model structure and data processing:

<img src="imgs/21103032538.jpg" width="45%">
python TaskForPairSentenceClassification.py

Result:

-- INFO: Epoch: 0, Batch[0/17181], Train loss :1.082, Train acc: 0.438
-- INFO: Epoch: 0, Batch[10/17181], Train loss :1.104, Train acc: 0.438
-- INFO: Epoch: 0, Batch[20/17181], Train loss :1.129, Train acc: 0.250     
-- INFO: Epoch: 0, Batch[30/17181], Train loss :1.063, Train acc: 0.375
...
-- INFO: Epoch: 0, Batch[17180/17181], Train loss :0.367, Train acc: 0.909
-- INFO: Epoch: 0, Train loss: 0.589, Epoch time = 2610.604s
...
-- INFO: Epoch: 9, Batch[0/17181], Train loss :0.064, Train acc: 1.000
-- INFO: Epoch: 9, Train loss: 0.142, Epoch time = 2542.781s
-- INFO: Accurcay on val 0.827
-- INFO: Accurcay on val 0.830

2.3 Multiple Choice (SWAG) Task

Model structure and data processing:

<img src="imgs/21110834330.jpg" width="50%"> <img src="imgs/21110819453.jpg" width="50%"> <img src="imgs/21110839843.jpg" width="50%">
python TaskForMultipleChoice.py

Result:

[2021-11-11 21:32:50] - INFO: Epoch: 0, Batch[0/4597], Train loss :1.433, Train acc: 0.250
[2021-11-11 21:32:58] - INFO: Epoch: 0, Batch[10/4597], Train loss :1.277, Train acc: 0.438
[2021-11-11 21:33:01] - INFO: Epoch: 0, Batch[20/4597], Train loss :1.249, Train acc: 0.438
        ......
[2021-11-11 21:58:34] - INFO: Epoch: 0, Batch[4590/4597], Train loss :0.489, Train acc: 0.875
[2021-11-11 21:58:36] - INFO: Epoch: 0, Batch loss :0.786, Epoch time = 1546.173s
[2021-11-11 21:28:55] - INFO: Epoch: 0, Batch[0/4597], Train loss :1.433, Train acc: 0.250
[2021-11-11 21:30:52] - INFO: He is throwing darts at a wall. A woman, squats alongside flies side to side with his gun.  ## False
[2021-11-11 21:30:52] - INFO: He is throwing darts at a wall. A woman, throws a dart at a dartboard.   ## False
[2021-11-11 21:30:52] - INFO: He is throwing darts at a wall. A woman, collapses and falls to the floor.   ## False
[2021-11-11 21:30:52] - INFO: He is throwing darts at a wall. A woman, is standing next to him.    ## True
[2021-11-11 21:30:52] - INFO: Accuracy on val 0.794

2.4 Question Answering (SQuAD) Task

Model structure and data processing:

<img src="imgs/22010353470.jpg" width="50%"> <img src="imgs/22010226560.jpg" width="50%"> <img src="imgs/22010215402.jpg" width="50%"> <img src="imgs/22010251228.jpg" width="50%">
python TaskForSQuADQuestionAnswering.py

Result:

[2022-01-02 14:42:17]缓存文件 ~/BertWithPretrained/data/SQuAD/dev-v1_128_384_64.pt 不存在,重新处理并缓存!
[2022-01-02 14:42:17] - DEBUG: <<<<<<<<  进入新的example  >>>>>>>>>
[2022-01-02 14:42:17] - DEBUG: ## 正在预处理数据 utils.data_helpers is_training = False
[2022-01-02 14:42:17] - DEBUG: ## 问题 id: 56be5333acb8001400a5030d
[2022-01-02 14:42:17] - DEBUG: ## 原始问题 text: Which performers joined the headliner during the Super Bowl 50 halftime show?
[2022-01-02 14:42:17] - DEBUG: ## 原始描述 text: CBS broadcast Super Bowl 50 in the U.S., and charged an average of $5 million for a  ....
[2022-01-02 14:42:17]- DEBUG: ## 上下文长度为:87, 剩余长度 rest_len 为 : 367
[2022-01-02 14:42:17] - DEBUG: ## input_tokens: ['[CLS]', 'which', 'performers', 'joined', 'the', 'headline', '##r', 'during', 'the', ...]
[2022-01-02 14:42:17] - DEBUG: ## input_ids:[101, 2029, 9567, 2587, 1996, 17653, 2099, 2076, 1996, 3565, 4605, 2753, 22589, 2265, 1029, 102, 6568, ....]
[2022-01-02 14:42:17] - DEBUG: ## segment ids:[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...]
[2022-01-02 14:42:17] - DEBUG: ## orig_map:{16: 0, 17: 1, 18: 2, 19: 3, 20: 4, 21: 5, 22: 6, 23: 7, 24: 7, 25: 7, 26: 7, 27: 7, 28: 8, 29: 9, 30: 10,....}
[2022-01-02 14:42:17] - DEBUG: ======================
....
[2022-01-02 15:13:50] - INFO: Epoch:0, Batch[810/7387] Train loss: 0.998, Train acc: 0.708
[2022-01-02 15:13:55] - INFO: Epoch:0, Batch[820/7387] Train loss: 1.130, Train acc: 0.708
[2022-01-02 15:13:59] - INFO: Epoch:0, Batch[830/7387] Train loss: 1.960, Train acc: 0.375
[2022-01-02 15:14:04] - INFO: Epoch:0, Batch[840/7387] Train loss: 1.933, Train acc: 0.542
......
[2022-01-02 15:15:27] - INFO:  ### Quesiotn: [CLS] when was the first university in switzerland founded..
[2022-01-02 15:15:27] - INFO:    ## Predicted answer: 1460
[2022-01-02 15:15:27] - INFO:    ## True answer: 1460
[2022-01-02 15:15:27] - INFO:    ## True answer idx: (tensor(46, tensor(47))
[2022-01-02 15:15:27] - I

Related Skills

View on GitHub
GitHub Stars605
CategoryEducation
Updated14d ago
Forks108

Languages

Python

Security Score

85/100

Audited on Mar 12, 2026

No findings