MQR
No description available
Install / Use
/learn @yfyuan01/MQRREADME
McQueen: A Transformer-based multimodal query rewrite benchmark
Our code is based on the original VLT5/Bart code.
Setup
# Create python environment (optional)
conda create -n MQR python=3.7
source activate MQR
# Install python dependencies
pip install -r requirements.txt
# Download language evalutation tools
https://github.com/bckim92/language-evaluation
# Download T5/BART backbone checkpoint
python download_backbones.py
# Train VL-T5
./VL-T5/
src/
modeling_t5.py modeling_bart.py <= VL-T5/VL-BART model classes
pretrain.py, pretrain_data.py, pretrain_model.py <= pretraining
vqa.py, vqa_data.py vqa_model.py ... <= fine-tuning on downstream tasks (ex. VQA, GQA, NLVR2)
multitask.py, multitask_data.py multiask_model.py <= multitask learning on 7 downstream tasks
param.py <= (argparse) configuration
tokenization.py <= custom tokenizer
utils.py, dist_utils.py <= utility functions
snap/ <= store weight checkpoints
scripts/ <= bash scripts for pretraining and finetuning
Dataset
The image files (anno_images) can be found in link.
The textual files (McQR_data) can be found in link.
Image feature extraction code can be found in ./feature_extraction. All the extracted image features can also be downloaded via link
The original dataset file with image annotations can be found in link.
Download Pre-trained models / Pre-extracted features
We host model checkpoints and features via google drive. We recommend using gdrive to download them.
Pretrained Models
- Download
snap/from Google Drive
gdrive download 1_SBj4sZ0gUqfBon1gFBiNRAmfHv5w_ph --recursive
Downstream tasks
[Query Rewrite]
First replace the generation_utils.py to the Huggingface transformers package installed in your device.
mv generation_utils.py [your path]/transformers/
Then start fine-tuning
# Finetuning with 4 gpus
cd VL-T5/
bash scripts/QueryRewrite_VLT5.sh 4
bash scripts/QueryRewrite_VLBart.sh 4
Reference
Please cite our paper if you use the dataset and model in your works:
Related Skills
node-connect
347.0kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
107.8kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
347.0kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
347.0kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
