PolyEncoder
An unofficial implementation of Poly-encoder (Poly-encoders: Transformer Architectures and Pre-training Strategies for Fast and Accurate Multi-sentence Scoring)
Install / Use
/learn @sfzhou5678/PolyEncoderREADME
Poly-encoders
This repository is an unofficial implementation of Poly-encoders: Transformer Architectures and Pre-training Strategies for Fast and Accurate Multi-sentence Scoring.
How to use
-
Download and unzip the ubuntu data https://www.dropbox.com/s/2fdn26rj6h9bpvl/ubuntudata.zip?dl=0
-
Prepare a pretrained BERT (https://github.com/huggingface/transformers)
-
pip3 install -r requirements.txt
-
Train a Poly-encoder:
python3 train.py -bert_model /your/pretrained/model/dir --output_dir /your/ckpt/dir --train_dir /your/data/dir --use_pretrain --architecture poly --poly_m 16 -
Train a Bi-encoder:
python3 train.py -bert_model /your/pretrained/model/dir --output_dir /your/ckpt/dir --train_dir /your/data/dir --use_pretrain --architecture bi
Results
The experimental settings and results are shown as follows:
- Dataset: Ubuntu
- Device: GTX 1060 6G x1
- Pretrained model: BERT-small-uncased (https://github.com/sfzhou5678/PretrainedLittleBERTs or https://storage.googleapis.com/bert_models/2020_02_20/all_bert_models.zip)
- Batch size: 32
- max_contexts_length: 128
- max_context_cnt: 4
- max_response_length:64
- lr: 5e-5
- Epochs: 3
| Model | R@1/10 | Training Speed | GPU Mem Consumption | | :---------------: | :--------: | :----------------: | :---------------------: | | Bi-encoder | 0.6714 | 3.15it/s | 1969 Mb | | Poly-encoder 16 | 0.6938 | 3.11it/s | 1975 Mb | | Poly-encoder 64 | 0.7026 | 3.08it/s | 2005 Mb | | Poly-encoder 360 | 0.7066 | 3.05it/s | 2071 Mb |
Different with the original paper, this experiment uses a bert-small-uncased model (from https://github.com/sfzhou5678/PretrainedLittleBERTs or https://storage.googleapis.com/bert_models/2020_02_20/all_bert_models.zip) rather than the bert-base. Besides, this experiment only uses batch_size =32, max_length = 128, and max_history=4 (which means select up to 4 context texts). All these settings lead to lower results but faster training speed. One can modify these settings for a better result.
Some Improvements
- Thanks to @chijames, this implementation is closer to the original paper and has achieved better performance.
BTW, If you have any suggestions or questions, please feel free to reach me out!
Related Skills
node-connect
337.3kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
83.2kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
337.3kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
commit-push-pr
83.2kCommit, push, and open a PR
