LamRA

[CVPR 2025] LamRA: Large Multimodal Model as Your Advanced Retrieval Assistant

Generate Convert Improve

Install / Use

/learn @Code-kunkun/LamRA

About this skill

Quality Score

0/100

README

LamRA: Large Multimodal Model as Your Advanced Retrieval Assistant

This repository is the official implementation of LamRA.

Installation

conda create -n lamra python=3.10 -y
conda activate lamra 

pip install --upgrade pip  # enable PEP 660 support 
pip install -r requirements.txt

pip install ninja
pip install flash-attn --no-build-isolation

New Version

We have updated the version of Qwen2.5-VL in the qwen2.5vl branch.

Quickstart

Please refer to the demo.py

Data Preparation

Download Qwen2-VL-7B and place it in ./checkpoints/hf_models/Qwen2-VL-7B-Instruct

For pre-training dataset, please refer to link

For multimodal instruction tuning datset, please refer to M-BEIR

For evaluation data related to the LamRA, please refer to LamRA_Eval

After downloading all of them, organize the data as follows in ./data

├── M-BEIR
├── nli_for_simcse.csv
├── rerank_data_for_training
├── flickr
├── coco
├── sharegpt4v
├── Urban1K
├── circo
├── genecis
├── vist
├── visdial
├── ccneg
├── sugar-crepe
├── MSVD
└── msrvtt

Training & Evaluation for LamRA-Ret

Pre-training

sh scripts/lamra_ret/pretrain.sh

# Evaluation 
sh scripts/eval/eval_pretrained.sh

# Merge LoRA for multimodal instruction tuning stage
sh scripts/merge_lora.sh

Multimodal instruction tuning

sh scripts/lamra_ret/finetune.sh

# Evaluation 
sh scripts/eval/eval_mbeir.sh   # eval under local pool setting

sh scripts/eval/eval_mbeir_global.sh   # eval under global pool setting

Training & Evaluation for LamRA-Rank

You can use the data we provide or run the following command to get the data for reranking training.

# Collecting data for reranking training
sh scripts/lamra_rank/get_train_data.sh

sh scripts/lamra_rank/merge_train_data.sh

# training for reranking
sh scripts/lamra_rank/train_rerank.sh

# pointwise reranking
sh scripts/eval/eval_rerank_mbeir_pointwise.sh

# listwise reranking
sh scripts/eval/eval_rerank_mbeir_listwise.sh

# Get the reranking results on M-BEIR
sh scirpts/eval/get_rerank_results_mbeir.sh

Evaluation on other benchmarks

# evaluation results on zeroshot datasets
sh scirpts/eval/eval_zeroshot.sh

# reranking the results on zeroshot datasets
sh scripts/eval/eval_rerank_zeroshot.sh

# get the final results
sh scripts/eval/get_rerank_results_zeroshot.sh

🫡 Acknowledgements

Many thanks to the code bases from lmms-finetune and E5-V.

Citation

If you use this code for your research or project, please cite:

@inproceedings{liu2025lamra,
  title={Lamra: Large multimodal model as your advanced retrieval assistant},
  author={Liu, Yikun and Zhang, Yajie and Cai, Jiayin and Jiang, Xiaolong and Hu, Yao and Yao, Jiangchao and Wang, Yanfeng and Xie, Weidi},
  booktitle={Proceedings of the Computer Vision and Pattern Recognition Conference},
  pages={4015--4025},
  year={2025}
}

Related Skills

node-connect

353.3k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

111.7k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

353.3k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

qqbot-media

353.3k

QQBot 富媒体收发能力。使用 <qqmedia> 标签，系统根据文件扩展名自动识别类型（图片/语音/视频/文件）。