DRhard

SIGIR'21: Optimizing DR with hard negatives and achieving SOTA first-stage retrieval performance on TREC DL Track.

Generate Convert Improve

Install / Use

/learn @jingtaozhan/DRhard

About this skill

Quality Score

0/100

README

Optimizing Dense Retrieval Model Training with Hard Negatives

Jingtao Zhan, Jiaxin Mao, Yiqun Liu, Jiafeng Guo, Min Zhang, Shaoping Ma

🔥News 2021-10: Our full paper, Learning Discrete Representations via Constrained Clustering for Effective and Efficient Dense Retrieval [code], was accepted by WSDM'22. It presents RepCONC and achieves state-of-the-art first-stage retrieval effectiveness-efficiency tradeoff. Part of its training foundation lies in this repo (STAR and ADORE).
🔥News 2021-8: Our full paper, Jointly Optimizing Query Encoder and Product Quantization to Improve Retrieval Performance [code], was accepted by CIKM'21. It presents JPQ and greatly improves the efficiency of Dense Retrieval. Part of its training foundation lies in this repo (dynamic hard negatives).

This repo provides code, retrieval results, and trained models for our SIGIR Full paper Optimizing Dense Retrieval Model Training with Hard Negatives. The previous version is Learning To Retrieve: How to Train a Dense Retrieval Model Effectively and Efficiently.

We achieve very impressive retrieval results on both passage and document retrieval bechmarks. The proposed two algorithms (STAR and ADORE) are very efficient. IMHO, they are well worth trying and most likely improve your retriever's performance by a large margin.

The following figure shows the pros and cons of different training methods. You can train an effective Dense Retrieval model in three steps. Firstly, warmup your model using random negatives or BM25 top negatives. Secondly, use our proposed STAR to train the query encoder and document encoder. Thirdly, use our proposed ADORE to train the query encoder.

Retrieval Results and Trained Models

| Passage Retrieval | Dev MRR@10 | Dev R@100 | Test NDCG@10 | Files | |---------------- | ------------|-------| ------- | ------ | | Inbatch-Neg | 0.264 | 0.837 | 0.583 | Model | | Rand-Neg | 0.301 | 0.853 | 0.612 | Model | | STAR | 0.340 | 0.867 | 0.642 | Model Train Dev TRECTest | | ADORE (Inbatch-Neg) | 0.316 | 0.860 | 0.658 | Model | | ADORE (Rand-Neg) | 0.326 | 0.865 | 0.661 | Model | | ADORE (STAR) | 0.347 | 0.876 | 0.683 | Model Train Dev TRECTest Leaderboard|

| Doc Retrieval | Dev MRR@100 | Dev R@100 | Test NDCG@10 | Files | |---------------- | ------------|-------| ------- | ------ | | Inbatch-Neg | 0.320 | 0.864 | 0.544 | Model | | Rand-Neg | 0.330 | 0.859 | 0.572 | Model | | STAR | 0.390 | 0.867 | 0.605 | Model Train Dev TRECTest | | ADORE (Inbatch-Neg) | 0.362 | 0.884 | 0.580 | Model | | ADORE (Rand-Neg) | 0.361 | 0.885 | 0.585 | Model | | ADORE (STAR) | 0.405 | 0.919 | 0.628 | Model Train Dev TRECTest Leaderboard|

If you want to use our first-stage leaderboard runs, contact me and I will send you the file.

If any links fail or the files go wrong, please contact me or open a issue.

Requirements

torch
transformers
faiss-gpu 
tensorboard
boto3

Data Download

To download all the needed data, run:

bash download_data.sh

Data Preprocess

Run the following codes.

python preprocess.py --data_type 0; python preprocess.py --data_type 1

Note: We utilized Transformers 2.x version to tokenize text when we conducted this research. However, when Transformers library updates to 3.x or 4.x versions, the RobertaTokenizer behaves differently. To support REPRODUCIBILITY, we copy the RobertaTokenizer source codes from 2.x version to star_tokenizer.py. During preprocessing, we use from star_tokenizer import RobertaTokenizer instead of from transformers import RobertaTokenizer. It is also necessary for you to do this if you use our trained model on other datasets.

Inference

With our provided trained models, you can easily replicate our reported experimental results. Note that minor variance may be observed due to environmental difference.

STAR

The following codes use the provided STAR model to compute query/passage embeddings and perform similarity search on the dev set. (You can use --faiss_gpus option to use gpus for much faster similarity search.)

python ./star/inference.py --data_type passage --max_doc_length 256 --mode dev   
python ./star/inference.py --data_type doc --max_doc_length 512 --mode dev

Run the following code to evaluate on MSMARCO Passage dataset.

python ./msmarco_eval.py ./data/passage/preprocess/dev-qrel.tsv ./data/passage/evaluate/star/dev.rank.tsv

Eval Started
#####################
MRR @10: 0.3404237731386721
QueriesRanked: 6980
#####################

Run the following code to evaluate on MSMARCO Document dataset.

python ./msmarco_eval.py ./data/doc/preprocess/dev-qrel.tsv ./data/doc/evaluate/star/dev.rank.tsv 100

Eval Started
#####################
MRR @100: 0.3903422772218344
QueriesRanked: 5193
#####################

ADORE

ADORE computes the query embeddings. The document embeddings are pre-computed by other DR models, like STAR. The following codes use the provided ADORE(STAR) model to compute query embeddings and perform similarity search on the dev set. (You can use --faiss_gpus option to use gpus for much faster similarity search.)

python ./adore/inference.py --model_dir ./data/passage/trained_models/adore-star --output_dir ./data/passage/evaluate/adore-star --preprocess_dir ./data/passage/preprocess --mode dev --dmemmap_path ./data/passage/evaluate/star/passages.memmap
python ./adore/inference.py --model_dir ./data/doc/trained_models/adore-star --output_dir ./data/doc/evaluate/adore-star --preprocess_dir ./data/doc/preprocess --mode dev --dmemmap_path ./data/doc/evaluate/star/passages.memmap

Evaluate ADORE(STAR) model on dev passage dataset:

python ./msmarco_eval.py ./data/passage/preprocess/dev-qrel.tsv ./data/passage/evaluate/adore-star/dev.rank.tsv

You will get

Eval Started
#####################
MRR @10: 0.34660697230181425
QueriesRanked: 6980
#####################

Evaluate ADORE(STAR) model on dev document dataset:

python ./msmarco_eval.py ./data/doc/preprocess/dev-qrel.tsv ./data/doc/evaluate/adore-star/dev.rank.tsv 100

You will get

Eval Started
#####################
MRR @100: 0.4049777020859768
QueriesRanked: 5193
#####################

Convert QID/PID Back

Our data preprocessing reassigns new ids for each query and document. Therefore, you may want to convert the ids back. We provide a script for this.

The following code shows an example to convert ADORE-STAR's ranking results on the dev passage dataset.

python ./cvt_back.py --input_dir ./data/passage/evaluate/adore-star/ --preprocess_dir ./data/passage/preprocess --output_dir ./data/passage/official_runs/adore-star --mode dev --dataset passage
python ./msmarco_eval.py ./data/passage/dataset/qrels.dev.small.tsv ./data/passage/official_runs/adore-star/dev.rank.tsv

You will get

Eval Started
#####################
MRR @10: 0.34660697230181425
QueriesRanked: 6980
#####################

Train

In the following instructions, we show how to replicate our experimental results on MSMARCO Passage Retrieval task.

STAR

We use the same warmup model as ANCE, the most competitive baseline, to enable a fair comparison. Please download it and extract it at ./data/passage/warmup

Next, we use this warmup model to extract static hard negatives, which will be utilized by STAR.

python

Related Skills

node-connect

338.7k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

83.6k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

338.7k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

commit-push-pr

83.6k

Commit, push, and open a PR