SkillAgentSearch skills...

Diver

Complex Reasoning Rag System, Agentic Rag System

Install / Use

/learn @AQ-MedAI/Diver
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

DIVER: A Multi-Stage Approach for Reasoning-intensive Information Retrieval

【 📄 中文 | 📖 arXiv | 🤗 HF Papers | 🚀 Diver Models| Wechat

While retrieval-augmented generation (RAG) excels at direct knowledge retrieval, it falters on complex queries that require abstract or multi-step reasoning. To bridge this gap, we developed DIVER, a retrieval pipeline engineered for these reasoning-intensive tasks. DIVER integrates four stages: document pre-processing, iterative LLM-driven query expansion, a specialized retriever fine-tuned on complex synthetic data, and a novel reranker that merges retrieval scores with LLM-generated helpfulness ratings. On the BRIGHT benchmark, DIVER sets a new state-of-the-art, significantly outperforming other reasoning-aware models (NDCG 45.8). These results underscore the effectiveness of integrating deep reasoning into retrieval for solving complex, real-world problems. More details can be seen at Diver paper.

Diver-Pipeline

Key Features

1.LLM-Driven Query Expansion: Intelligently refines the search query.

2.Reasoning-Enhanced Retriever: A fine-tuned model that understands complex relationships.

3.Merged Reranker: Combines traditional search scores with LLM-based "helpfulness" scores for superior ranking.

🎉 Update

TODO List

  • ⬜ Release DIVER-VL-Embedding and DIVER-VL-Reranker: Release source code and models
  • ✅ Release DIVER-Reranker: Release source code and models

Model Downloads

You can download the following table to see the various parameters for your use case. If you are located in mainland China, we also provide the model on ModelScope.cn to speed up the download process.

| Model | #Total Params | Context Length | Download | BRIGHT | | :------------------: | :---------------: | :----------------: | :--------------------------------------------------------------------------------------------------------------------------------------------------------: | :----------------: | | Diver-GroupRank-7B | 7B | 32K | [🤗 HuggingFace]https://huggingface.co/AQ-MedAI/Diver-GroupRank-7B <br>[🤖 ModelScope]https://www.modelscope.cn/models/AQ-MedAI/Diver-GroupRank-7B | | | Diver-GroupRank-32B | 32B | 32K | [🤗 HuggingFace]https://huggingface.co/AQ-MedAI/Diver-GroupRank-32B <br>[🤖 ModelScope]https://www.modelscope.cn/models/AQ-MedAI/Diver-GroupRank-32B | 46.8 | | DIVER-Retriever-4B-1020 | 4B | 40K | [🤗 HuggingFace]https://huggingface.co/AQ-MedAI/Diver-Retriever-4B-1020 <br>[🤖 ModelScope]https://www.modelscope.cn/models/AQ-MedAI/Diver-Retriever-4B-1020 | 31.9 | | DIVER-Retriever-4B | 4B | 40K | [🤗 HuggingFace]https://huggingface.co/AQ-MedAI/Diver-Retriever-4B <br>[🤖 ModelScope]https://www.modelscope.cn/models/AQ-MedAI/Diver-Retriever-4B | 28.9 | | DIVER-Retriever-1.7B | 1.7B | 40K | [🤗 HuggingFace]https://huggingface.co/AQ-MedAI/Diver-Retriever-1.7B <br>[🤖 ModelScope]https://www.modelscope.cn/models/AQ-MedAI/Diver-Retriever-1.7B | 27.3 | | DIVER-Retriever-0.6B | 0.6B | 32K | [🤗 HuggingFace]https://huggingface.co/AQ-MedAI/Diver-Retriever-0.6B <br>[🤖 ModelScope]https://www.modelscope.cn/models/AQ-MedAI/Diver-Retriever-0.6B | 25.2 |

Evaluation

Overall Evaluation

Performance comparisons with competitive baselines on the BRIGHT leaderboard. The best result for each dataset is highlighted in bold.

| Method | Avg. | Bio. | Earth. | Econ. | Psy. | Rob. | Stack. | Sus. | Leet. | Pony | AoPS | TheoQ. | TheoT. | | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | | Rank-R1-14B | 20.5 | 31.2 | 38.5 | 21.2 | 26.4 | 22.6 | 18.9 | 27.5 | 9.2 | 20.2 | 9.7 | 11.9 | 9.2 | | Qwen1.5-7B with InteRank-3B | 27.4 | 51.2 | 51.4 | 22.4 | 31.9 | 17.3 | 26.6 | 22.4 | 24.5 | 23.1 | 13.5 | 19.3 | 25.5 | | GPT4 with Rank1-32B | 29.4 | 49.7 | 35.8 | 22.0 | 37.5 | 22.5 | 21.7 | 35.0 | 18.8 | 32.5 | 10.8 | 22.9 | 43.7 | | ReasonIR with QwenRerank | 36.9 | 58.2 | 53.2 | 32.0 | 43.6 | 28.8 | 37.6 | 36.0 | 33.2 | 34.8 | 7.9 | 32.6 | 45.0 | | ReasonIR with Rank-R1-32B | 38.8 | 59.5 | 55.1 | 37.9 | 52.7 | 30.0 | 39.3 | 45.1 | 32.1 | 17.1 | 10.7 | 40.4 | 45.6 | | RaDeR with QwenRerank | 39.2 | 58.0 | 59.2 | 33.0 | 49.4 | 31.8 | 39.0 | 36.4 | 33.5 | 33.3 | 10.8 | 34.2 | 51.6 | | XRR2 | 40.3 | 63.1 | 55.4 | 38.5 | 52.9 | 37.1 | 38.2 | 44.6 | 21.9 | 35.0 | 15.7 | 34.4 | 46.2 | | ReasonRank | 40.8 | 62.72 | 55.53 | 36.7 | 54.64 | 35.69 | 38.03 | 44.81 | 29.46 | 25.56 | 14.38 | 41.99 | 50.06 | | DIVER | 41.6 | 62.2 | 58.7 | 34.4 | 52.9 | 35.6 | 36.5 | 42.9 | 38.9 | 25.4 | 18.3 | 40.0 | 53.1 | | BGE Reasoner | 45.2 | 66.5 | 63.7 | 39.4 | 50.3 | 37 | 42.9 | 43.7 | 35.1 | 44.3 | 17.2 | 44.2 | 58.5 | | DIVER V2 | 45.8 | 68 | 62.5 | 42.0 | 58.2 | 41.5 | 44.3 | 49.2 | 34.8 | 32.9 | 19.1 | 44.3 | 52.6 |

Diver Retriever Evaluation

<table> <thead> <tr> <th>Method</th> <th style="text-align:right">Avg.</th> <th style="text-align:right">Bio.</th> <th style="text-align:right">Earth.</th> <th style="text-align:right">Econ.</th> <th style="text-align:right">Psy.</th> <th style="text-align:right">Rob.</th> <th style="text-align:right">Stack.</th> <th style="text-align:right">Sus.</th> <th style="text-align:right">Leet.</th> <th style="text-align:right">Pony</th> <th style="text-align:right">AoPS</th> <th style="text-align:right">TheoQ.</th> <th style="text-align:right">TheoT.</th> </tr> </thead> <tbody> <tr> <td colspan=12 style="text-align:center"><strong>Evaluate Retriever with Original Query</strong></td> </tr> <tr> <td>BM25</td> <td style="text-align:right">14.5</td> <td style="text-align:right">18.9</td> <td style="text-align:right">27.2</td> <td style="text-align:right">14.9</td> <td style="text-align:right">12.5</td> <td style="text-align:right">13.6</td> <td style="text-align:right">18.4</td> <td style="text-align:right">15.0</td> <td style="text-align:right">24.4</td> <td style="text-align:right">7.9</td> <td style="text-align:right">6.2</td> <td style="text-align:right">10.4</td> <td style="text-align:right">4.9</td> </tr> <tr> <td>SBERT</td> <td style="text-align:right">14.9</td> <td style="text-align:right">15.1</td> <td style="text-align:right">20.4</td> <td style="text-align:right">16.6</td> <td style="text-align:right">22.7</td> <td style="text-align:right">8.2</td> <td style="text-align:right">11.0</td> <td style="text-align:right">15.3</td> <td style="text-align:right">26.4</td> <td style="text-align:right">7.0</td> <td style="text-align:right">5.3</td> <td style="text-align:right
View on GitHub
GitHub Stars252
CategoryDevelopment
Updated11h ago
Forks24

Languages

Python

Security Score

95/100

Audited on Apr 2, 2026

No findings