MADTP
MADTP: Multimodal Alignment-Guided Dynamic Token Pruning for Accelerating Vision-Language Transformer
Install / Use
/learn @double125/MADTPREADME
MADTP: Multimodal Alignment-Guided Dynamic Token Pruning for Accelerating Vision-Language Transformer
<p align="center"> <a href="https://arxiv.org/pdf/2403.02991.pdf" target="_blank">[Paper]</a> <a href="https://arxiv.org/abs/2403.02991" target="_blank">[ArXiv]</a> <a href="https://github.com/double125/MADTP" target="_blank">[Code]</a> <img src="MADTP.png" width="800">Official implementation of MADTP: Multimodal Alignment-Guided Dynamic Token Pruning for Accelerating Vision-Language Transformer.
What's New 🥳
-
(SEP 6, 2024), we released the
implementationandscriptsof MADTP. (Note thatcheckpointsandlogswill come soon.)[Code] 🚩 -
(Feb 27, 2024), MADTP: Multimodal Alignment-Guided Dynamic Token Pruning for Accelerating Vision-Language Transformer was accepted by CVPR 2024. [Paper] [ArXiv]. 🎉
Installation
The code is tested on Pytorch==1.11.0, cuda==11.3.1, and python==3.8.13. The dependencies can be installed by:
conda env create -f environment.yml
Supported Tasks, Models, and Datasets
Type | Supported Tasks | Supported Models | Supported Datasets | --- | --- | :---: | :---: Multi-modal | Visual Reasoning | BLIP (instructions) | NLVR2 Multi-modal |Image Caption | BLIP (instructions) | COCO Caption Multi-modal |Visual Question Answer | BLIP (instructions) | VQAv2 Multi-modal |Image-Text Retrieval | CLIP (instructions), BLIP (instructions) | COCO, Flickr30k Multi-modal |Text-Image Retrieval | CLIP (instructions), BLIP (instructions) | COCO, Flickr30k
Visual Reasoning on the NLVR2 Dataset
-
Dataset & Annotation
Download the NLVR2 dataset, unzip it under the
datasetsfolder, and accordingly modify theimage_rootin config. Download all-in-one annotations (including annotations for Visual Reasoning, Image Caption, VQA, Image-Text Retrieval, and Text-Image Retrieval tasks) from this link, unzip it under theannotationfolder, and accordingly modify theannotationin config. See here for expected folder structres. -
Evaluation
Download compressed checkpoints from the table below, put them under the
outputfolder, and accordingly modify the--pretrainedof the scripts. For example, to evaluate a compressed model with 0.5 reduce ratio:python -m torch.distributed.run --nproc_per_node=8 compress_nlvr.py --evaluate \ --pretrained output/nlvr_nlvr2_compression_p0.5/model_base_nlvr_nlvr2_p0.5_compressed.pth \ --config ./configs/nlvr.yaml \ --output_dir output/nlvr_nlvr2_compression_p0.5 -
Compression
Download the uncompressed model from the table below, put it under the
pretrainedfolder, and accordingly modify thepretrainedin config. For example, to conduct a compression at 0.5 reduce ratio on 8 A100 GPUs (80G):python -m torch.distributed.run --nproc_per_node=8 compress_nlvr_dtp.py --p 0.5 --epoch 15 \ --pretrained pretrained/model_base_nlvr.pth \ --config ./configs/nlvr.yaml \ --output_dir output/nlvr_nlvr2_compression_p0.5 -
Resources
Reduction | Uncompressed Model | Compression Script | Training Log | Compressed Checkpoint | Evaluation Script --- | :---: | :---: | :---: | :---: | :---: 0.3 | <a href="https://drive.google.com/uc?export=download&id=1pcsvlNRzzoq_q6Kaku_Kkg1MFELGoIxE">Download</a> | Link | <a href="https://drive.google.com/file/d/1aqiY86op26ceuWp6SFu1kaScqDnAIl1G/view?usp=drive_link">Download</a> | <a href="https://drive.google.com/file/d/1foe-c6qU97QGEz7kNC9OsGJ8OXk7OmQT/view?usp=drive_link">Download</a> | Link 0.5 | <a href="https://drive.google.com/uc?export=download&id=1pcsvlNRzzoq_q6Kaku_Kkg1MFELGoIxE">Download</a> | Link | <a href="https://drive.google.com/file/d/1JyYypUDbZVD00ep5SSnQEc6LnOEL-ODT/view?usp=drive_link">Download</a> | <a href="https://drive.google.com/file/d/1R_TgQKlHv6Y6Fh5_ny4fRKNLAva75Frs/view?usp=drive_link">Download</a> | Link 0.6 | <a href="https://drive.google.com/uc?export=download&id=1pcsvlNRzzoq_q6Kaku_Kkg1MFELGoIxE">Download</a> | Link| <a href="https://drive.google.com/file/d/1YB8xJee2R7B5PSjzLEJBjmQkBs5XAfIe/view?usp=drive_link">Download</a> | <a href="https://drive.google.com/file/d/1Sg_agxwV04o13d6XnJLblGby5cedtngT/view?usp=drive_link">Download</a> | Link 0.7 | <a href="https://drive.google.com/uc?export=download&id=1pcsvlNRzzoq_q6Kaku_Kkg1MFELGoIxE">Download</a> | Link| <a href="https://drive.google.com/file/d/11DbcbzsCjA7mH5gbJQrtrHapobIz12n-/view?usp=drive_link">Download</a> | <a href="https://drive.google.com/file/d/1qcZf5YOl1aDW8S5OEDsIH6lZN4z2UgI8/view?usp=drive_link">Download</a> | Link 0.8 | <a href="https://drive.google.com/uc?export=download&id=1pcsvlNRzzoq_q6Kaku_Kkg1MFELGoIxE">Download</a> | Link | <a href="https://drive.google.com/file/d/16K2WIslVVoAzqmMcwvoBWI4gTfxNc8Rv/view?usp=drive_link">Download</a> | <a href="https://drive.google.com/file/d/1l_isAhyRTr7n8qpzXaa8y6hz2BSyR95Y/view?usp=drive_link">Download</a> | Link
Image Caption on the COCO Caption Dataset
-
Dataset & Annotation
Download the COCO Caption dataset, unzip it under the
datasetsfolder, and accordingly modify theimage_rootin config. Download all-in-one annotations from this link, unzip it under theannotationfolder, and accordingly modify theannotationin config. See here for expected folder structres. -
Evaluation
Download compressed checkpoints from the table below, put them under the
outputfolder, and accordingly modify the--pretrainedof the scripts. For example, to evaluate a compressed model with 0.5 reduce ratio:python -m torch.distributed.run --nproc_per_node=8 compress_caption_dtp.py --evaluate \ --pretrained output/caption_coco_compression_p0.5/model_base_caption_capfilt_large_coco_p0.5_compressed.pth \ --config ./configs/caption_coco.yaml \ --output_dir output/caption_coco_compression_p0.5 -
Compression
Download the uncompressed model from the table below, put it under the
pretrainedfolder, and accordingly modify thepretrainedin config. For example, to conduct a compression at 0.5 reduce ratio on 8 A100 GPUs (80G):python -m torch.distributed.run --nproc_per_node=8 compress_caption_dtp.py --p 0.5 --epoch 5 \ --pretrained pretrained/model_base_caption_capfilt_large.pth \ --config ./configs/caption_coco.yaml \ --output_dir output/caption_coco_compression_p0.5
