SkillAgentSearch skills...

DMN

CVPR2024: Dual Memory Networks: A Versatile Adaptation Approach for Vision-Language Models

Install / Use

/learn @YBZh/DMN
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

Dual Memory Networks: A Versatile Adaptation Approach for Vision-Language Models

This repository provides the official PyTorch implementation of our CVPR 2024 paper:

[<ins>Dual Memory Networks: A Versatile Adaptation Approach for Vision-Language Models</ins>]Paper
Authors: <ins>Yabin Zhang</ins>, <ins>Wenjie Zhu</ins>, <ins>Hui Tang</ins>, <ins>Zhiyuan Ma</ins>, <ins>Kaiyang Zhou</ins>, <ins>Lei Zhang</ins>

Overview

This repository contains the implementation of DMN for image classification with a pre-trained CLIP. We consider four task settings:

  • Zero-shot classification in a test-time adaptation manner
  • Few-shot classification
  • Training-free few-shot classification
  • Out-of-distribution generalization
<p align = "center"> <img src = "figures/acc_gflops.png"> </p> <p align = "center"> Results on ImageNet dataset under different task settings. </p> <p align = "center"> <img src = "figures/framework.png"> </p> <p align = "center"> The overall framework of our DMN. </p>

Prerequisites

Hardware

This implementation is for the single-GPU configuration. All experiments can be reproduced on a GPU with more than 10GB memory (e.g., 1080Ti)!

Environment

The code is tested on PyTorch 1.13.1.

Datasets

We suggest downloading all datasets to a root directory (${data_root}), and renaming the directory of each dataset as suggested in ${ID_to_DIRNAME} in ./data/datautils.py. This would allow you to evaluate multiple datasets within the same run.
If this is not feasible, you could evaluate different datasets separately, and change the ${data_root} accordingly in the bash script.

For zero/few-shot classification, we consider 11 datasets:

For out-of-distribution generalization, we consider 4 datasets:

Run DMN

We provide a simple bash script under ./scripts/run.sh. You can modify the paths and other args in the script. One can easily reproduce all results by:

bash ./scripts/run.sh

For simplicity, we use set_id to denote different datasets. A complete list of set_id can be found in ${ID_to_DIRNAME} in ./data/datautils.py.

Main Results

Zero-shot Classification

<p align = "center"> <img src = "figures/zero-shot.png"> </p> <p align = "center"> </p>

Few-shot Classification

<p align = "center"> <img src = "figures/few-shot.png"> </p> <p align = "center"> Few-shot classification results on 11 datasets with a VITB/16 image encoder. </p>

Out-of-Distribution Generalization

<div align="center">

| Method | ImageNet(IN) | IN-A | IN-V2 | IN-R | IN-Sketch | Average | OOD Average | |------------------|:--------:|:----------:|:-----------:|:----------:|:---------------:|:-------:|:-----------:| | CLIP-RN50 | 58.16 | 21.83 | 51.41 | 56.15 | 33.37 | 44.18 | 40.69 | | Ensembled prompt| 59.81 | 23.24 | 52.91 | 60.72 | 35.48 | 46.43 | 43.09 | | CoOp | 63.33 | 23.06 | 55.40 | 56.60 | 34.67 | 46.61 | 42.43 | | CoCoOp | 62.81 | 23.32 | 55.72 | 57.74 | 34.48 | 46.81 | 42.82 | | TPT | 60.74 | 26.67 | 54.70 | 59.11 | 35.09 | 47.26 | 43.89 | | DMN-ZS | 63.87 | 28.57 | 56.12 | 61.44 | 39.84 | 49.97 | 46.49 |

</div> <br />

Citation

If you find our code useful or our work relevant, please consider citing:

@inproceedings{zhang2024dual,
  title={Dual Memory Networks: A Versatile Adaptation Approach for Vision-Language Models},
  author={Zhang, Yabin and Zhu, Wenjie and Tang, Hui and Ma, Zhiyuan and Zhou, Kaiyang and Zhang, Lei},
  booktitle={Proceedings of the IEEE/CVF conference on computer vision and pattern recognition},
  year={2024}
}

Acknowledgements

We thank the authors of CoOp/CoCoOp and TPT for their open-source implementation and instructions on data preparation.

Related Skills

View on GitHub
GitHub Stars93
CategoryDevelopment
Updated12d ago
Forks4

Languages

Python

Security Score

100/100

Audited on Mar 24, 2026

No findings