ExBERT
No description available
Install / Use
/learn @cgmhaicenter/ExBERTREADME
exBERT
The details of the model is in paper.
Pre-train an exBERT model (only the extension part)
In command line:
python Pretraining.py -e 1
-b 256
-sp path_to_storage
-dv 0 1 2 3 -lr 1e-04
-str exBERT
-config path_to_config_file_of_the_OFF_THE_SHELF_MODEL ./config_and_vocab/exBERT/bert_config_ex_s3.json
-vocab ./config_and_vocab/exBERT/exBERT_vocab.txt
-pm_p path_to_state_dict_of_the_OFF_THE_SHELF_MODEL
-dp path_to_your_training_data
-ls 128
-p 1
You can replace the path_to_config_file_of_the_OFF_THE_SHELF_MODEL and path_to_state_dict_of_the_OFF_THE_SHELF_MODEL to any weel pre-trained model in BERT archietecture.
./config_and_vocab/exBERT/bert_config_ex_s3.json defines the size of extension module.
Pre-train an exBERT model (whole model)
python Pretraining.py -e 1
-b 256
-sp path_to_storage
-dv 0 1 2 3 -lr 1e-04
-str exBERT
-config path_to_config_file_of_the_OFF_THE_SHELF_MODEL ./config_and_vocab/exBERT/bert_config_ex_s3.json
-vocab ./config_and_vocab/exBERT/exBERT_vocab.txt
-pm_p path_to_state_dict_of_the_OFF_THE_SHELF_MODEL
-dp path_to_your_training_data
-ls 128
-p 1
-t_ex_only ""
-t_ex_only "" enable training the whole model
Pre-train an exBERT with no extension of vocab
python Pretraining.py -e 1
-b 256
-sp path_to_storage
-dv 0 1 2 3 -lr 1e-04
-str exBERT
-config path_to_config_file_of_the_OFF_THE_SHELF_MODEL config_and_vocab/exBERT_no_ex_vocab/bert_config_ex_s3.json
-vocab path_to_vocab_file_of_the_OFF_THE_SHELF_MODEL
-pm_p path_to_state_dict_of_the_OFF_THE_SHELF_MODEL
-dp path_to_your_training_data
-ls 128
-p 1
-t_ex_only ""
Data preparation
Input data for pre-training script should be a .pkl file which contains two a list with two elements, e.g. [list1, list2].
list1 and list2 should contains the sentences like [CLS] sentence A [SEP] sentence B [SEP]. The only differnece between list1 and list2 is the relationship between sentence A and sentence B is IsNext or NotNext. Please check example_data.pkl
We also provide a simple script to generate the data from raw text file.
python data_preprocess.py -voc path_to_vocab_file -ls 128 -dp path_to_txt_file -n_c 5 -rd 1 -sp ./your_data.pkl
replace 128 to the max length limit you want
try python data_preprocess.py -voc ./exBERT_vocab.txt -ls 128 -dp ./example_raw_text.txt -n_c 5 -rd 1 -sp ./example_data.pkl
Or you can do your own data preparation and organize the data with the format metioned above.
Related Skills
node-connect
351.2kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
110.6kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
351.2kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
351.2kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
