DIVER: A Multi-Stage Approach for Reasoning-intensive Information Retrieval

【 📄 中文 | 📖 arXiv | 🤗 HF Papers | 🚀 Diver Models| Wechat 】

While retrieval-augmented generation (RAG) excels at direct knowledge retrieval, it falters on complex queries that require abstract or multi-step reasoning. To bridge this gap, we developed DIVER, a retrieval pipeline engineered for these reasoning-intensive tasks. DIVER integrates four stages: document pre-processing, iterative LLM-driven query expansion, a specialized retriever fine-tuned on complex synthetic data, and a novel reranker that merges retrieval scores with LLM-generated helpfulness ratings. On the BRIGHT benchmark, DIVER sets a new state-of-the-art, significantly outperforming other reasoning-aware models (NDCG 45.8). These results underscore the effectiveness of integrating deep reasoning into retrieval for solving complex, real-world problems. More details can be seen at Diver paper.

Diver-Pipeline

Key Features

1.LLM-Driven Query Expansion: Intelligently refines the search query.

2.Reasoning-Enhanced Retriever: A fine-tuned model that understands complex relationships.

3.Merged Reranker: Combines traditional search scores with LLM-based "helpfulness" scores for superior ranking.

🎉 Update

[2025-11-20] 🚀 We released our GroupRank reranking model Diver-GroupRank-7B and Diver-GroupRank-32B. The inference and SFT training code can be found at ./Reranker/rerank_groupwise.py and ./Reranker/train_sft_groupwise_reranker.sh. Our Diver-GroupRank-32B achieves 46.8 at BRIGHT via test-time scaling. The details can be found at GroupRank paper.
[2025-11-11] Environment installation guide is provided in ./env_requirements/README.md for reproduction. The code for merging pointwise and listwise rerankers is at ./Reranker/rerank_merge_point_and_list.py.
[2025-10-20] 🚀 We released DIVER-Retriever-4B-1020 model at ModelScope and Hugging Face, which achieve 31.9 at BRIGHT.
[2025-10-14] 🚀 We released DIVER-Retriever-1.7B model at ModelScope and Hugging Face, which achieve 27.3 at BRIGHT.
[2025-09-27] 🎉 Our Diver-Retriever-4B model have achieved monthly 2.64k+ downloads at 🤗 HuggingFace !
[2025-09-12] 🚀 We released the code for listwise reranking using Gemini; it can be found at ./Retriever/rerank_listwise.py, and it achieved a score of 43.9 on BRIGHT.
[2025-09-05] 🚀 We released DIVER-Retriever-0.6B model at ModelScope and Hugging Face, which achieve 25.2 at BRIGHT.
[2025-08-28] 🚀 We released our DIVER-Retriever-4B model at ModelScope.
[2025-08-24] 🏆 We released our Diver V2, which reaches 45.8 on Bright Leaderboard.
[2025-08-18] 🚀 We released our full codebase, including inference and SFT training.

TODO List

⬜ Release DIVER-VL-Embedding and DIVER-VL-Reranker: Release source code and models
✅ Release DIVER-Reranker: Release source code and models

Model Downloads

You can download the following table to see the various parameters for your use case. If you are located in mainland China, we also provide the model on ModelScope.cn to speed up the download process.

| Model | #Total Params | Context Length | Download | BRIGHT | | :------------------: | :---------------: | :----------------: | :--------------------------------------------------------------------------------------------------------------------------------------------------------: | :----------------: | | Diver-GroupRank-7B | 7B | 32K | [🤗 HuggingFace]https://huggingface.co/AQ-MedAI/Diver-GroupRank-7B <br>[🤖 ModelScope]https://www.modelscope.cn/models/AQ-MedAI/Diver-GroupRank-7B | | | Diver-GroupRank-32B | 32B | 32K | [🤗 HuggingFace]https://huggingface.co/AQ-MedAI/Diver-GroupRank-32B <br>[🤖 ModelScope]https://www.modelscope.cn/models/AQ-MedAI/Diver-GroupRank-32B | 46.8 | | DIVER-Retriever-4B-1020 | 4B | 40K | [🤗 HuggingFace]https://huggingface.co/AQ-MedAI/Diver-Retriever-4B-1020 <br>[🤖 ModelScope]https://www.modelscope.cn/models/AQ-MedAI/Diver-Retriever-4B-1020 | 31.9 | | DIVER-Retriever-4B | 4B | 40K | [🤗 HuggingFace]https://huggingface.co/AQ-MedAI/Diver-Retriever-4B <br>[🤖 ModelScope]https://www.modelscope.cn/models/AQ-MedAI/Diver-Retriever-4B | 28.9 | | DIVER-Retriever-1.7B | 1.7B | 40K | [🤗 HuggingFace]https://huggingface.co/AQ-MedAI/Diver-Retriever-1.7B <br>[🤖 ModelScope]https://www.modelscope.cn/models/AQ-MedAI/Diver-Retriever-1.7B | 27.3 | | DIVER-Retriever-0.6B | 0.6B | 32K | [🤗 HuggingFace]https://huggingface.co/AQ-MedAI/Diver-Retriever-0.6B <br>[🤖 ModelScope]https://www.modelscope.cn/models/AQ-MedAI/Diver-Retriever-0.6B | 25.2 |

Evaluation

Overall Evaluation

Performance comparisons with competitive baselines on the BRIGHT leaderboard. The best result for each dataset is highlighted in bold.

| Method | Avg. | Bio. | Earth. | Econ. | Psy. | Rob. | Stack. | Sus. | Leet. | Pony | AoPS | TheoQ. | TheoT. | | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | | Rank-R1-14B | 20.5 | 31.2 | 38.5 | 21.2 | 26.4 | 22.6 | 18.9 | 27.5 | 9.2 | 20.2 | 9.7 | 11.9 | 9.2 | | Qwen1.5-7B with InteRank-3B | 27.4 | 51.2 | 51.4 | 22.4 | 31.9 | 17.3 | 26.6 | 22.4 | 24.5 | 23.1 | 13.5 | 19.3 | 25.5 | | GPT4 with Rank1-32B | 29.4 | 49.7 | 35.8 | 22.0 | 37.5 | 22.5 | 21.7 | 35.0 | 18.8 | 32.5 | 10.8 | 22.9 | 43.7 | | ReasonIR with QwenRerank | 36.9 | 58.2 | 53.2 | 32.0 | 43.6 | 28.8 | 37.6 | 36.0 | 33.2 | 34.8 | 7.9 | 32.6 | 45.0 | | ReasonIR with Rank-R1-32B | 38.8 | 59.5 | 55.1 | 37.9 | 52.7 | 30.0 | 39.3 | 45.1 | 32.1 | 17.1 | 10.7 | 40.4 | 45.6 | | RaDeR with QwenRerank | 39.2 | 58.0 | 59.2 | 33.0 | 49.4 | 31.8 | 39.0 | 36.4 | 33.5 | 33.3 | 10.8 | 34.2 | 51.6 | | XRR2 | 40.3 | 63.1 | 55.4 | 38.5 | 52.9 | 37.1 | 38.2 | 44.6 | 21.9 | 35.0 | 15.7 | 34.4 | 46.2 | | ReasonRank | 40.8 | 62.72 | 55.53 | 36.7 | 54.64 | 35.69 | 38.03 | 44.81 | 29.46 | 25.56 | 14.38 | 41.99 | 50.06 | | DIVER | 41.6 | 62.2 | 58.7 | 34.4 | 52.9 | 35.6 | 36.5 | 42.9 | 38.9 | 25.4 | 18.3 | 40.0 | 53.1 | | BGE Reasoner | 45.2 | 66.5 | 63.7 | 39.4 | 50.3 | 37 | 42.9 | 43.7 | 35.1 | 44.3 | 17.2 | 44.2 | 58.5 | | DIVER V2 | 45.8 | 68 | 62.5 | 42.0 | 58.2 | 41.5 | 44.3 | 49.2 | 34.8 | 32.9 | 19.1 | 44.3 | 52.6 |

Diver Retriever Evaluation

<table> <thead> <tr> <th>Method</th> <th style="text-align:right">Avg.</th> <th style="text-align:right">Bio.</th> <th style="text-align:right">Earth.</th> <th style="text-align:right">Econ.</th> <th style="text-align:right">Psy.</th> <th style="text-align:right">Rob.</th> <th style="text-align:right">Stack.</th> <th style="text-align:right">Sus.</th> <th style="text-align:right">Leet.</th> <th style="text-align:right">Pony</th> <th style="text-align:right">AoPS</th> <th style="text-align:right">TheoQ.</th> <th style="text-align:right">TheoT.</th> </tr> </thead> <tbody> <tr> <td colspan=12 style="text-align:center"><strong>Evaluate Retriever with Original Query</strong></td> </tr> <tr> <td>BM25</td> <td style="text-align:right">14.5</td> <td style="text-align:right">18.9</td> <td style="text-align:right">27.2</td> <td style="text-align:right">14.9</td> <td style="text-align:right">12.5</td> <td style="text-align:right">13.6</td> <td style="text-align:right">18.4</td> <td style="text-align:right">15.0</td> <td style="text-align:right">24.4</td> <td style="text-align:right">7.9</td> <td style="text-align:right">6.2</td> <td style="text-align:right">10.4</td> <td style="text-align:right">4.9</td> </tr> <tr> <td>SBERT</td> <td style="text-align:right">14.9</td> <td style="text-align:right">15.1</td> <td style="text-align:right">20.4</td> <td style="text-align:right">16.6</td> <td style="text-align:right">22.7</td> <td style="text-align:right">8.2</td> <td style="text-align:right">11.0</td> <td style="text-align:right">15.3</td> <td style="text-align:right">26.4</td> <td style="text-align:right">7.0</td> <td style="text-align:right">5.3</td> <td style="text-align:right

Diver

Install / Use

README