SkillAgentSearch skills...

MADTP

MADTP: Multimodal Alignment-Guided Dynamic Token Pruning for Accelerating Vision-Language Transformer

Install / Use

/learn @double125/MADTP
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

MADTP: Multimodal Alignment-Guided Dynamic Token Pruning for Accelerating Vision-Language Transformer

<p align="center"> <a href="https://arxiv.org/pdf/2403.02991.pdf" target="_blank">[Paper]</a> <a href="https://arxiv.org/abs/2403.02991" target="_blank">[ArXiv]</a> <a href="https://github.com/double125/MADTP" target="_blank">[Code]</a> <img src="MADTP.png" width="800">

Official implementation of MADTP: Multimodal Alignment-Guided Dynamic Token Pruning for Accelerating Vision-Language Transformer.

What's New 🥳

  • (SEP 6, 2024), we released the implementation and scripts of MADTP. (Note that checkpoints and logs will come soon.)[Code] 🚩

  • (Feb 27, 2024), MADTP: Multimodal Alignment-Guided Dynamic Token Pruning for Accelerating Vision-Language Transformer was accepted by CVPR 2024. [Paper] [ArXiv]. 🎉

Installation

The code is tested on Pytorch==1.11.0, cuda==11.3.1, and python==3.8.13. The dependencies can be installed by:

conda env create -f environment.yml

Supported Tasks, Models, and Datasets

Type | Supported Tasks | Supported Models | Supported Datasets | --- | --- | :---: | :---: Multi-modal | Visual Reasoning | BLIP (instructions) | NLVR2 Multi-modal |Image Caption | BLIP (instructions) | COCO Caption Multi-modal |Visual Question Answer | BLIP (instructions) | VQAv2 Multi-modal |Image-Text Retrieval | CLIP (instructions), BLIP (instructions) | COCO, Flickr30k Multi-modal |Text-Image Retrieval | CLIP (instructions), BLIP (instructions) | COCO, Flickr30k

Visual Reasoning on the NLVR2 Dataset

  • Dataset & Annotation

    Download the NLVR2 dataset, unzip it under the datasets folder, and accordingly modify the image_root in config. Download all-in-one annotations (including annotations for Visual Reasoning, Image Caption, VQA, Image-Text Retrieval, and Text-Image Retrieval tasks) from this link, unzip it under the annotation folder, and accordingly modify the annotation in config. See here for expected folder structres.

  • Evaluation

    Download compressed checkpoints from the table below, put them under the output folder, and accordingly modify the --pretrained of the scripts. For example, to evaluate a compressed model with 0.5 reduce ratio:

    python -m torch.distributed.run --nproc_per_node=8 compress_nlvr.py --evaluate \
    --pretrained output/nlvr_nlvr2_compression_p0.5/model_base_nlvr_nlvr2_p0.5_compressed.pth \
    --config ./configs/nlvr.yaml \
    --output_dir output/nlvr_nlvr2_compression_p0.5
    
  • Compression

    Download the uncompressed model from the table below, put it under the pretrained folder, and accordingly modify the pretrained in config. For example, to conduct a compression at 0.5 reduce ratio on 8 A100 GPUs (80G):

    python -m torch.distributed.run --nproc_per_node=8 compress_nlvr_dtp.py --p 0.5 --epoch 15 \
    --pretrained pretrained/model_base_nlvr.pth \
    --config ./configs/nlvr.yaml \
    --output_dir output/nlvr_nlvr2_compression_p0.5
    
  • Resources

    Reduction | Uncompressed Model | Compression Script | Training Log | Compressed Checkpoint | Evaluation Script --- | :---: | :---: | :---: | :---: | :---: 0.3 | <a href="https://drive.google.com/uc?export=download&id=1pcsvlNRzzoq_q6Kaku_Kkg1MFELGoIxE">Download</a> | Link | <a href="https://drive.google.com/file/d/1aqiY86op26ceuWp6SFu1kaScqDnAIl1G/view?usp=drive_link">Download</a> | <a href="https://drive.google.com/file/d/1foe-c6qU97QGEz7kNC9OsGJ8OXk7OmQT/view?usp=drive_link">Download</a> | Link 0.5 | <a href="https://drive.google.com/uc?export=download&id=1pcsvlNRzzoq_q6Kaku_Kkg1MFELGoIxE">Download</a> | Link | <a href="https://drive.google.com/file/d/1JyYypUDbZVD00ep5SSnQEc6LnOEL-ODT/view?usp=drive_link">Download</a> | <a href="https://drive.google.com/file/d/1R_TgQKlHv6Y6Fh5_ny4fRKNLAva75Frs/view?usp=drive_link">Download</a> | Link 0.6 | <a href="https://drive.google.com/uc?export=download&id=1pcsvlNRzzoq_q6Kaku_Kkg1MFELGoIxE">Download</a> | Link| <a href="https://drive.google.com/file/d/1YB8xJee2R7B5PSjzLEJBjmQkBs5XAfIe/view?usp=drive_link">Download</a> | <a href="https://drive.google.com/file/d/1Sg_agxwV04o13d6XnJLblGby5cedtngT/view?usp=drive_link">Download</a> | Link 0.7 | <a href="https://drive.google.com/uc?export=download&id=1pcsvlNRzzoq_q6Kaku_Kkg1MFELGoIxE">Download</a> | Link| <a href="https://drive.google.com/file/d/11DbcbzsCjA7mH5gbJQrtrHapobIz12n-/view?usp=drive_link">Download</a> | <a href="https://drive.google.com/file/d/1qcZf5YOl1aDW8S5OEDsIH6lZN4z2UgI8/view?usp=drive_link">Download</a> | Link 0.8 | <a href="https://drive.google.com/uc?export=download&id=1pcsvlNRzzoq_q6Kaku_Kkg1MFELGoIxE">Download</a> | Link | <a href="https://drive.google.com/file/d/16K2WIslVVoAzqmMcwvoBWI4gTfxNc8Rv/view?usp=drive_link">Download</a> | <a href="https://drive.google.com/file/d/1l_isAhyRTr7n8qpzXaa8y6hz2BSyR95Y/view?usp=drive_link">Download</a> | Link

Image Caption on the COCO Caption Dataset

  • Dataset & Annotation

    Download the COCO Caption dataset, unzip it under the datasets folder, and accordingly modify the image_root in config. Download all-in-one annotations from this link, unzip it under the annotation folder, and accordingly modify the annotation in config. See here for expected folder structres.

  • Evaluation

    Download compressed checkpoints from the table below, put them under the output folder, and accordingly modify the --pretrained of the scripts. For example, to evaluate a compressed model with 0.5 reduce ratio:

    python -m torch.distributed.run --nproc_per_node=8 compress_caption_dtp.py --evaluate \
    --pretrained output/caption_coco_compression_p0.5/model_base_caption_capfilt_large_coco_p0.5_compressed.pth \
    --config ./configs/caption_coco.yaml \
    --output_dir output/caption_coco_compression_p0.5
    
  • Compression

    Download the uncompressed model from the table below, put it under the pretrained folder, and accordingly modify the pretrained in config. For example, to conduct a compression at 0.5 reduce ratio on 8 A100 GPUs (80G):

    python -m torch.distributed.run --nproc_per_node=8 compress_caption_dtp.py --p 0.5 --epoch 5 \
    --pretrained pretrained/model_base_caption_capfilt_large.pth \
    --config ./configs/caption_coco.yaml \
    --output_dir output/caption_coco_compression_p0.5
    
<!-- * Resources Reduction | Uncompressed Model | Compression Script | Training Log | Compressed Checkpoint | Evaluation Script --- | :---: | :---: | :---: | :---: | :---: 0.5 | <a href="https://drive.google.com/uc?export=download&id=1qW_0DpQsDc6u9g3fSfTI4g_VXYsMA5s8">Download</a> | [Link](./scripts/compress_caption_coco_p0.5.sh) | <a href="*****r">Download</a> | <a href="*****">Download</a> | [Link](./scripts/evaluate_caption_coco_p0.5_compressed.sh) 0.75 | <a href="https://drive.google.com/uc?export=download&id=1qW_0DpQsDc6u9g3fSfTI4g_VXYsMA5s8">Download</a> | [Link](./scripts/compress_caption_coco_p0.75.sh)| <a href="*****">Download</a> | <a href="*****">Download</a> | [Link](./scripts/evaluate_caption_coco_p0.75_compressed.sh) -->

Visual Question Answer on the

View on GitHub
GitHub Stars50
CategoryDevelopment
Updated11d ago
Forks3

Languages

Python

Security Score

95/100

Audited on Mar 20, 2026

No findings