LamRA
[CVPR 2025] LamRA: Large Multimodal Model as Your Advanced Retrieval Assistant
Install / Use
/learn @Code-kunkun/LamRAREADME
LamRA: Large Multimodal Model as Your Advanced Retrieval Assistant
This repository is the official implementation of LamRA.
🏡 Project Page | 📄 Paper | 🤗 LamRA-Ret-Pretrained | 🤗 LamRA-Ret | 🤗 LamRA-Rank | 🤗 Dataset
Installation
conda create -n lamra python=3.10 -y
conda activate lamra
pip install --upgrade pip # enable PEP 660 support
pip install -r requirements.txt
pip install ninja
pip install flash-attn --no-build-isolation
New Version
We have updated the version of Qwen2.5-VL in the qwen2.5vl branch.
Quickstart
Please refer to the demo.py
Data Preparation
Download Qwen2-VL-7B and place it in ./checkpoints/hf_models/Qwen2-VL-7B-Instruct
For pre-training dataset, please refer to link
For multimodal instruction tuning datset, please refer to M-BEIR
For evaluation data related to the LamRA, please refer to LamRA_Eval
After downloading all of them, organize the data as follows in ./data
├── M-BEIR
├── nli_for_simcse.csv
├── rerank_data_for_training
├── flickr
├── coco
├── sharegpt4v
├── Urban1K
├── circo
├── genecis
├── vist
├── visdial
├── ccneg
├── sugar-crepe
├── MSVD
└── msrvtt
Training & Evaluation for LamRA-Ret
Pre-training
sh scripts/lamra_ret/pretrain.sh
# Evaluation
sh scripts/eval/eval_pretrained.sh
# Merge LoRA for multimodal instruction tuning stage
sh scripts/merge_lora.sh
Multimodal instruction tuning
sh scripts/lamra_ret/finetune.sh
# Evaluation
sh scripts/eval/eval_mbeir.sh # eval under local pool setting
sh scripts/eval/eval_mbeir_global.sh # eval under global pool setting
Training & Evaluation for LamRA-Rank
You can use the data we provide or run the following command to get the data for reranking training.
# Collecting data for reranking training
sh scripts/lamra_rank/get_train_data.sh
sh scripts/lamra_rank/merge_train_data.sh
# training for reranking
sh scripts/lamra_rank/train_rerank.sh
# pointwise reranking
sh scripts/eval/eval_rerank_mbeir_pointwise.sh
# listwise reranking
sh scripts/eval/eval_rerank_mbeir_listwise.sh
# Get the reranking results on M-BEIR
sh scirpts/eval/get_rerank_results_mbeir.sh
Evaluation on other benchmarks
# evaluation results on zeroshot datasets
sh scirpts/eval/eval_zeroshot.sh
# reranking the results on zeroshot datasets
sh scripts/eval/eval_rerank_zeroshot.sh
# get the final results
sh scripts/eval/get_rerank_results_zeroshot.sh
🫡 Acknowledgements
Many thanks to the code bases from lmms-finetune and E5-V.
Citation
If you use this code for your research or project, please cite:
@inproceedings{liu2025lamra,
title={Lamra: Large multimodal model as your advanced retrieval assistant},
author={Liu, Yikun and Zhang, Yajie and Cai, Jiayin and Jiang, Xiaolong and Hu, Yao and Yao, Jiangchao and Wang, Yanfeng and Xie, Weidi},
booktitle={Proceedings of the Computer Vision and Pattern Recognition Conference},
pages={4015--4025},
year={2025}
}
Related Skills
node-connect
353.3kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
111.7kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
353.3kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
353.3kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
