HEAVEN

Official Repository of "Hybrid-Vector Retrieval for Visually Rich Documents: Combining Single-Vector Efficiency and Multi-Vector Accuracy"

Generate Convert Improve

Install / Use

/learn @juyeonnn/HEAVEN

About this skill

Quality Score

0/100

README

HEAVEN: Hybrid-Vector Retrieval for Visually Rich Documents

Official Repository for our paper "Hybrid-Vector Retrieval for Visually Rich Documents: Combining Single-Vector Efficiency and Multi-Vector Accuracy"

🔥News

[2025/11] ViMDoc is now available on Hugging Face🤗!

ViMDoc Benchmark

ViMDoc (Visually-rich Long Multi-Document Retrieval Benchmark) for evaluating visual document retrieval under both multi-document and long-document settings.

from datasets import load_dataset
dataset = load_dataset("kaistdata/ViMDoc", split="ViMDoc")

Format

Sample datasets are available in benchmark/{ViMDoc,OpenDocVQA,ViDoSeek,M3DocVQA}. Each contains sample_query.json with queries and ground truth document IDs:

{
    "id": "<query_id>",
    "query": "<query_text>",
    "doc_ids": ["<document_id>"]
}

Sample document pages are stored in sample_pages/.

Note: Full datasets for other benchmarks are available from their original sources: OpenDocVQA | ViDoSeek | M3DocVQA

Indexing

(1) Encoding (Query/Document)

cd indexing/encode

# Visusal encoder
python encoder.py --encoder_type dse --folder ViMDoc
python encoder.py --encoder_type colqwen25 --folder ViMDoc

# Textual encoder
python ocr.py --device 0 --folder ViMDoc
python encoder.py --encoder_type nvembedv2 --folder ViMDoc
python encoder.py --encoder_type bge_m3_multi --folder ViMDoc

Available Encoders

| Encoder | Modality | Type | HF Checkpoint | |---------|----------|------|------------| | colpali | Visusal | Multi-Vector | vidore/colpali-v1.3 | | colqwen2 | Visusal | Multi-Vector | vidore/colqwen2-v1.0 | | colqwen25 | Visusal | Multi-Vector | vidore/colqwen2.5-v0.2 | | gme | Visusal | Single-Vector | Alibaba-NLP/gme-Qwen2-VL-2B-Instruct | | dse | Visusal | Single-Vector | MrLight/dse-qwen2-2b-mrl-v1 | | visret | Visusal | Single-Vector | openbmb/VisRAG-Ret | | bge_m3_multi | Textual (OCR) | Multi-Vector | BAAI/bge-m3 | | bge_m3 | Textual (OCR) | Single-Vector | BAAI/bge-m3 | | nvembedv2 | Textual (OCR) | Single-Vector | nvidia/NV-Embed-v2 |

(2) VS-Page Construction


cd indexing/vs-page

# Step 1: Document Layout Analysis
python DLA.py --dataset ViMDoc --device 0

# Step 2: Assemble & VS-page Encoding
python assemble.py \
    --dataset ViMDoc \
    --encoder_type dse \
    --reduction_factor 15 \
    --device 0

Retrieval - HEAVEN

Run the complete HEAVEN pipeline (Stage 1 + Stage 2):

cd retrieval/heaven

python heaven.py \
    --folder ViMDoc \
    --stage1_model dse \
    --stage2_model colqwen25 \
    --device 0 \
    --preprocess

Stage 1 Only :

python stage1.py --folder ViMDoc --model dse --alpha 0.1 --filter_ratio 0.5

Stage 2 Only :

# Preprocess queries first
python preprocess.py --folder ViMDoc --model colqwen25

# Run Stage 2
python stage2.py --folder ViMDoc --model colqwen25 --stage1_model dse --k 200 --filter_ratio 0.25

Structure

HEAVEN/
│
├── benchmark/                    
│   ├── ViMDoc/                  
│   ├── OpenDocVQA/            
│   ├── ViDoSeek/                
│   └── M3DocVQA/
│       
├── indexing/                      
│   ├── encode/                  
│   └── vs-page/
│               
├── retrieval/                    
│   ├── baeline/                   
│   └── heaven/
│                
└── run.sh

Citation

@article{kim2025hybrid,
  title={Hybrid-Vector Retrieval for Visually Rich Documents: Combining Single-Vector Efficiency and Multi-Vector Accuracy},
  author={Kim, Juyeon and Lee, Geon and Choi, Dongwon and Kim, Taeuk and Shin, Kijung},
  journal={arXiv preprint arXiv:2510.22215},
  year={2025}
}

Related Skills

node-connect

352.9k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

111.5k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

352.9k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

qqbot-media

352.9k

QQBot 富媒体收发能力。使用 <qqmedia> 标签，系统根据文件扩展名自动识别类型（图片/语音/视频/文件）。