<p align="center"><img src="paper_material/simulator_attack_poster.png"></p> The official implementation code of the paper CVPR 2021 paper "Simulating Unknown Target Models for Query-Efficient Black-box Attacks"

By Chen Ma, Li Chen, and Jun-Hai Yong.

[Paper] [Slides] [中文Slides] [Poster]

The procedure of the Simulator Attack, where q1 and q2 are the corresponding perturbations for generating query pairs in the attack. The queries of the first t iterations are fed into the target model to estimate the gradients. These queries and the corresponding outputs are collected to fine-tune the Simulator, which is trained without using the target model. The fine-tuned Simulator can accurately simulate the unknown target model, thereby transferring the queries and improving overall query efficiency. More details can be found in Section 3 of the paper.

Citation

We kindly ask anybody who uses this code to cite the following bibtex:

@InProceedings{ma2021simulator,
    author    = {Ma, Chen and Chen, Li and Yong, Jun-Hai},
    title     = {Simulating Unknown Target Models for Query-Efficient Black-Box Attacks},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2021},
    pages     = {11835-11844}
}

Abstract

Many adversarial attacks have been proposed to investigate the security issues of deep neural networks. In the black-box setting, current model stealing attacks train a substitute model to counterfeit the functionality of the target model. However, the training requires querying the target model. Consequently, the query complexity remains high, and such attacks can be defended easily. This study aims to train a generalized substitute model called "Simulator", which can mimic the functionality of any unknown target model. To this end, we build the training data with the form of multiple tasks by collecting query sequences generated during the attacks of various existing networks. The learning process uses a mean square error-based knowledge-distillation loss in the meta-learning to minimize the difference between the Simulator and the sampled networks. The meta-gradients of this loss are then computed and accumulated from multiple tasks to update the Simulator and subsequently improve generalization. When attacking a target model that is unseen in training, the trained Simulator can accurately simulate its functionality using its limited feedback. As a result, a large fraction of queries can be transferred to the Simulator, thereby reducing query complexity. Results of the comprehensive experiments conducted using the CIFAR-10, CIFAR-100, and TinyImageNet datasets demonstrate that the proposed approach reduces query complexity by several orders of magnitude compared to the baseline method.

Requirement

Pytorch 1.4.0 or above, torchvision 1.3.0 or above, bidict, pretrainedmodels 0.7.4, opencv-python

Folder structure

+-- configures
|   |-- meta_simulator_attack_conf.json  # the hyperparameters setting of simulator attack
|   |-- Bandits.json  # the hyperparameters setting of Bandits attack
|   |-- prior_RGF_attack_conf.json  # the hyperparameters setting of RGF and P-RGF attack
|   |-- meta_attack_conf.json  # the hyperparameters setting of Meta Attack
|   |-- NES_attack_conf.json  # the hyperparameters setting of NES Attack
|   |-- SWITCH_attack_conf.json  # the hyperparameters setting of SWITCH Attack
+-- dataset
|   |-- standard_model.py  # the wrapper of standard classification networks, and it converts the input image's pixels to the range of 0 to 1 before feeding.
|   |-- defensive_model.py # the wrapper of defensive networks, and it converts the input image's pixels to the range of 0 to 1 before feeding.
|   |-- dataset_loader_maker.py  # it returns the data loader class that includes 1000 attacks images for the experiments.
|   |-- npz_dataset.py  # it is the dataset class that includes 1000 attacks images for the experiments.
|   |-- meta_two_queries_dataset.py  # it is the dataset class that trains the Simulator.
|   |-- meta_img_grad_dataset.py  # it is the dataset class that trains the auto-encoder meta-learner of Meta Attack.
+-- meta_simulator_bandits
|   +-- learning
|       +-- script
|           |-- generate_bandits_training_data_script.py   # it can generate the training data that is generated by using Bandits to attack multiple pre-trained networks.
|       |-- train.py  # the main class for training the Simulator.
|       |-- meta_network.py  # the wrapper class of the meta network, and it can transform any classification network to the meta network in meta-learning.
|       |-- meta_distillation_learner.py  # it includes the main procedure of meta-learning.
|       |-- inner_loop.py  # it includes the inner update of meta-learning.
|   +-- attack
|       |-- meta_model_finetune.py  # it includes the class used for fine-tuning the Simulator in the attack.
|       |-- simulate_bandits_attack_shrink.py  # it includes the main procedure of Simulator Attack.
+-- cifar_models   # this folder includes the target models of CIFAR-10, i.e., PyramidNet-272, GDAS, WRN-28, and WRN-40 networks.
+-- tiny_imagenet_models   # this folder includes the target models of TinyImageNet, e.g., DenseNet and ResNeXT
+-- xxx_attack  # other attacks for the compared experiments in the paper.
|-- config.py   # the main configuration of Simulator Attack, remember to modify PY_ROOT to be the project's folder path in your machine environment.

How to train

Generate the training data and organize them to the format of multi-tasks.

(1) Download CIFAR-10, CIFAR-100 and TinyImageNet dataset, including training set and validation set and then extract or unzip the data files. Then, you must modify IMAGE_DATA_ROOT variable in config.py to the absolute path of the dataset folder.

(2) Generate training data.

Generate training data of CIFAR-10 dataset by performing untargeted Bandits attack under L2 norm.

python of meta_simulator_bandits/learning/script/generate_bandits_training_data_script.py --gpu 0 --dataset CIFAR-10 --batch-size 100 --loss cw --norm l2

Generate training data of CIFAR-10 dataset by performing untargeted Bandits attack under L∞ norm.

python meta_simulator_bandits/learning/script/generate_bandits_training_data_script.py --gpu 0 --dataset CIFAR-10 --batch-size 100 --loss cw --norm linf

Generate training data of CIFAR-10 dataset by performing targeted Bandits attack under L2 norm.

python of meta_simulator_bandits/learning/script/generate_bandits_training_data_script.py --gpu 0 --dataset CIFAR-10 --batch-size 100 --loss cw --norm l2 --targeted

Generate training data of CIFAR-10 dataset by performing targeted Bandits attack under L∞ norm.

python meta_simulator_bandits/learning/script/generate_bandits_training_data_script.py --gpu 0 --dataset CIFAR-10 --batch-size 100 --loss cw --norm linf --targeted

You can replace CIFAR-10 to CIFAR-100 in command lines to generate the training data of CIFAR-100.

Generate training data of TinyImageNet dataset by performing untargeted Bandits attack under L2 norm.

python meta_simulator_bandits/learning/script/generate_bandits_training_data_script.py --gpu 0 --dataset TinyImageNet --batch-size 100 --loss cw --norm l2 --max-queries 10000

Generate training data of TinyImageNet dataset by performing untargeted Bandits attack under L∞ norm.

python meta_simulator_bandits/learning/script/generate_bandits_training_data_script.py --gpu 0 --dataset TinyImageNet --batch-size 100 --loss cw --norm linf --max-queries 10000

Generate training data of TinyImageNet dataset by performing targeted Bandits attack under L2 norm.

python meta_simulator_bandits/learning/script/generate_bandits_training_data_script.py --gpu 0 --dataset TinyImageNet --batch-size 100 --loss cw --norm l2 --targeted --max-queries 10000

Generate training data of TinyImageNet dataset by performing targeted Bandits attack under L∞ norm.

python meta_simulator_bandits/learning/script/generate_bandits_training_data_script.py --gpu 0 --dataset TinyImageNet --batch-size 100 --loss cw --norm linf --targeted --max-queries 10000

Train the Simulator.

An example of training Simulator using the data generated by performing targeted attack under L2 norm:

python meta_simulator_bandits/learning/train.py --gpu 0 --arch resnet34 --dataset CIFAR-10 --split_protocol TRAIN_I_TEST_II --load_task_mode NO_LOAD --data_loss_type cw --loss_type mse --adv_norm l2 --targeted

Explanation: --adv_norm l2 is to use the training data that is generated by performing L2 norm attack

Note that the backbone of defensive models in our paper is ResNet-50. Please add an additional --without_resnet to train the Simulator without using resnet-50 network.

Download the trained model

We provide the trained model in Tsinghua cloud disk drive so that you can download. We provide two types of L2 norm models: one is trained on training data with ε=4.6 of L2 norm, and the other is trained on the training data with ε=1.0 in CIFAR-10 and CIFAR-100 dataset, and ε=2.0 in TinyImageNet dataset.

How to attack

Option 1: attack PyramidNet-272, GDAS, WRN-28, and WRN-40 one by one with one command line.

python meta_simulator_bandits/attack/simulate_bandits_attack_shrink.py --gpu 0 --norm linf --dataset CIFAR-10 --data_loss cw --distillation_loss mse --meta_arch resnet34 --test_archs

Option

SimulatorAttack

Install / Use

README