BlackVIP

Official implementation for CVPR'23 paper "BlackVIP: Black-Box Visual Prompting for Robust Transfer Learning"

Generate Convert Improve

Install / Use

/learn @changdaeoh/BlackVIP

About this skill

Quality Score

0/100

README

BlackVIP: Black-Box Visual Prompting for Robust Transfer Learning

We provide the official PyTorch Implementation of 'BlackVIP: Black-Box Visual Prompting for Robust Transfer Learning' (CVPR 2023) 

Changdae Oh, Hyeji Hwang, Hee-young Lee, YongTaek Lim, Geunyoung Jung, Jiyoung Jung, Hosik Choi, and Kyungwoo Song

Abstract

With the surge of large-scale pre-trained models (PTMs), fine-tuning these models to numerous downstream tasks becomes a crucial problem. Consequently, parameter efficient transfer learning (PETL) of large models has grasped huge attention. While recent PETL methods showcase impressive performance, they rely on optimistic assumptions: 1) the entire parameter set of a PTM is available, and 2) a sufficiently large memory capacity for the fine-tuning is equipped. However, in most real-world applications, PTMs are served as a black-box API or proprietary software without explicit parameter accessibility. Besides, it is hard to meet a large memory requirement for modern PTMs. In this work, we propose black-box visual prompting (BlackVIP), which efficiently adapts the PTMs without knowledge about model architectures and parameters. BlackVIP has two components; 1) Coordinator and 2) simultaneous perturbation stochastic approximation with gradient correction (SPSA-GC). The Coordinator designs input-dependent image-shaped visual prompts, which improves few-shot adaptation and robustness on distribution/location shift. SPSA-GC efficiently estimates the gradient of a target model to update Coordinator. Extensive experiments on 16 datasets demonstrate that BlackVIP enables robust adaptation to diverse domains without accessing PTMs' parameters, with minimal memory requirements.

Research Highlights

Input-Dependent Dynamic Visual Prompting: To our best knowledge, this is the first paper that explores the input-dependent visual prompting on black-box settings. For this, we devise Coordinator, which reparameterizes the prompt as an autoencoder to handle the input-dependent prompt with tiny parameters.
New Algorithm for Black-Box Optimization: We propose a new zeroth-order optimization algorithm, SPSA-GC, that gives look-ahead corrections to the SPSA's estimated gradient resulting in boosted performance.
End-to-End Black-Box Visual Prompting: By equipping Coordinator and SPSA-GC, BlackVIP adapts the PTM to downstream tasks without parameter access and large memory capacity.
Empirical Results: We extensively validate BlackVIP on 16 datasets and demonstrate its effectiveness regarding few-shot adaptability and robustness on distribution/object-location shift.

<hr/>

Coverage of this repository

Methods

BlackVIP (Ours)
BAR
VP (with our SPSA-GC)
VP
Zero-Shot Inference

Experiments

main performance (Tab. 2 and Tab. 3 of paper)
- two synthetic datasets - [Biased MNIST, Loc-MNIST]
- 14 transfer learning benchmarks - [Caltech101, OxfordPets, StanfordCars, Flowers102, Food101, FGVCAircraft, SUN397, DTD, SVHN, EuroSAT, Resisc45, CLEVR, UCF101, ImageNet]
ablation study (Tab. 5 and Tab. 6 of paper)
- varying architectures (coordinator, target model)
- varying coordinator weights and optimizers

Setup

Run the following commands to create the environment.
- Note that we slightly modifed the Dassl.pytorch to my_dassl for flexible experiments.

# Clone this repo
git clone https://github.com/changdaeoh/BlackVIP.git
cd BlackVIP

# Create a conda environment
conda create -y -n blackvip python=3.8

# Activate the environment
conda activate blackvip

# Install torch and torchvision
# Please refer to https://pytorch.org/ if you need a different cuda version
conda install pytorch==1.12.1 torchvision==0.13.1 cudatoolkit=11.6 -c pytorch -c conda-forge

# Install dependencies
cd my_dassl
pip install -r requirements.txt

# Install additional requirements
cd ..
pip install -r requirements.txt

Data preparation

To prepare following 11 datasets (adopted by CoOp), please follow the instruction from https://github.com/KaiyangZhou/CoOp/blob/main/DATASETS.md
- Caltech101, OxfordPets, StanfordCars, Flowers102, Food101, FGVCAircraft, SUN397, DTD, EuroSAT, UCF101, and ImageNet
- We use the same few-shot split of CoOp for above 11 datasets.
To prepare following three datasets (adopted by VP), the instructions are below:
- SVHN:
  - Create a folder named svhn/ under $DATA.
  - To download the dataset, run BlackVIP/datasets/svhn_dl.py after replacing the DATAPATH in 44 line as yours.
  - Download split_mlai_SVHN.json from this link and put it under $DATA/svhn.
- Resisc45:
  - Create a folder named resisc45/ under $DATA.
  - Download NWPU-RESISC45.rar from https://onedrive.live.com/?authkey=%21AHHNaHIlzp%5FIXjs&id=5C5E061130630A68%21107&cid=5C5E061130630A68&parId=root&parQt=sharedby&o=OneUp and extract the file under $DATA/resisc45.
  - Download split_mlai_Resisc45.json from this link and put it under $DATA/resisc45.
- CLEVR:
  - Download CLEVR_v1.0.zip from https://dl.fbaipublicfiles.com/clevr/CLEVR_v1.0.zip and extract the file under $DATA.
  - Download split_mlai_CLEVR.json from this link and put it under $DATA/CLEVR_v1.0.
- We provide fixed train/val/test splits for these three datasets to ensure reproducibility and fair comparison for future work.
To prepare our synthetic dataset -LocMNIST-, run /datasets/mk_locmnist.py as python mk_locmnist.py --data_root [YOUR-DATAPATH] --f_size [1 or 4]
For Biased MNIST, no precedures are required.

Run

transfer learning benchmarks

Move to BlackVIP/scripts/method_name directory
Across 14 benchmark datasets and four methods, you can refer this docs containing the hyperparameter table
On the targeted dataset, run the commands with dataset-specific configs as below:

# for BlackVIP, specify {1:dataset, 2:epoch, 3:moms, 4:spsa_gamma, 5:spsa_c, 6:p_eps}
sh tl_bench.sh svhn 5000 0.9 0.2 0.005 1.0

# for BAR, specify {1:dataset, 2:epoch, 3:init_lr, 4:min_lr}
sh tl_bench.sh svhn 5000 5.0 0.1

# for VP w/ SPSA-GC, specify {1:dataset, 2:epoch, 3:moms, 4:spsa_a, 5:spsa_c}
sh tl_bench.sh svhn 5000 0.9 10.0 0.01

# for VP (white-box), specify {1:dataset, 2:epoch, 3:lr}
sh tl_bench.sh svhn 1000 40.0

# for Zero-shot CLIP inference, move to 'BlackVIP/scripts/coop' and run:
sh zeroshot_all.sh

synthetic datasets

In BlackVIP/scripts/method_name/, there are three files to reproduce the results of Biased MNIST and Loc-MNIST: synthetic_bm_easy.sh, synthetic_bm_hard.sh, and synthetic_lm.sh
- Hyperparameters are also in this docs.

# for BlackVIP on Loc-MNIST, specify {1:fake-digit-size, 2:moms, 3:spsa_alpha, 4:spsa_a, 5:spsa_c}
sh synthetic_lm.sh 1 0.9 0.5 0.01 0.005  # 1:1 setting
sh synthetic_lm.sh 4 0.95 0.5 0.02 0.01  # 1:4 seeting

# for BlackVIP on Biased MNIST, specify {1:moms, 2:spsa_alpha, 3:spsa_a, 4:spsa_c}
sh synthetic_bm_easy.sh 0.9 0.4 0.01 0.01  # spurious correlation = 0.8
sh synthetic_bm_hard.sh 0.9 0.4 0.01 0.01  # spurious correlation = 0.9

# other methods can be runned similarly to the above.

ablation study

# for BlackVIP, specify {1:target_backbone, 2:spsa_alpha, 3:moms, 4:spsa_gamma, 5:spsa_c, 6:p_eps}
sh ablation_arch_rn.sh rn50 0.5 0.9 0.2 0.01 0.3

<hr />

Contact

For any questions, discussions, and proposals, please contact to changdae.oh@uos.ac.kr or kyungwoo.song@gmail.com

Citation

If you use our code in your research, please kindly consider citing:

@InProceedings{Oh_2023_CVPR,
    author    = {Oh, Changdae and Hwang, Hyeji and Lee, Hee-young and Lim, YongTaek and Jung, Geunyoung and Jung, Jiyoung and Choi, Hosik and Song, Kyungwoo},
    title     = {BlackVIP: Black-Box Visual Prompting for Robust Transfer Learning},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2023},
    pages     = {24224-24235}
}

Acknowledgements

Our overall experimental pipeline is based on CoOp, CoCoOp repository. For baseline construction, we bollowed/refered the code from repositories of VP, BAR, and AR. We appreciate the authors (Zhou et al., Bahng et al., Tsai et al.) and Savan for sharing their code.

Related Skills

YC-Killer

2.7k

A library of enterprise-grade AI agents designed to democratize artificial intelligence and provide free, open-source alternatives to overvalued Y Combinator startups. If you are excited about democratizing AI access & AI agents, please star ⭐️ this repository and use the link in the readme to join our open source AI research team.

best-practices-researcher

The most comprehensive Claude Code skills registry | Web Search: https://skills-registry-web.vercel.app

mentoring-juniors

Community-contributed instructions, agents, skills, and configurations to help you make the most of GitHub Copilot.

groundhog

399

Groundhog's primary purpose is to teach people how Cursor and all these other coding agents work under the hood. If you understand how these coding assistants work from first principles, then you can drive these tools harder (or perhaps make your own!).

changdaeoh

View profile

View on GitHub

GitHub Stars108

CategoryEducation

Updated26d ago

Forks9

changdaeoh/BlackVIP

Languages

Python

Security Score

85/100

Audited on Feb 27, 2026

No findings