VARS
Official code for `Visual Attention Emerges from Recurrent Sparse Reconstruction' (ICML 2022)
Install / Use
/learn @bfshi/VARSREADME
Visual Attention Emerges from Recurrent Sparse Reconstruction (ICML 2022)
Baifeng Shi, Yale Song, Neel Joshi, Trevor Darrell, Xin Wang
<img src="VARS.png" alt="drawing" width="600"/>Codebase of our ICML'22 paper "Visual Attention Emerges from Recurrent Sparse Reconstruction".
Usage
Install PyTorch 1.7.0+ and torchvision 0.8.1+ from the official website.
requirements.txt lists all the dependencies:
pip install -r requirements.txt
In addition, please also install the magickwand library:
apt-get install libmagickwand-dev
Training
Take RVT-Ti with VARS-D for an example. We use single node with 8 gpus for training:
python -m torch.distributed.launch --nproc_per_node=8 --master_port 12345 main.py --model rvt_tiny --data-path path/to/imagenet --output_dir output/here --num_workers 8 --batch-size 128 --attention vars_d
We provide pretrained weights for VARS-D and VARS-SD.
To train models with different scales or different attention algorithms, please change the arguments --model and --attention.
Testing
python main.py --model rvt_tiny --data-path path/to/imagenet --eval --resume path/to/checkpoint --attention vars_d
To enable robustness evaluation, please add one of --inc_path /path/to/imagenet-c, --ina_path /path/to/imagenet-a, --inr_path /path/to/imagenet-r or --insk_path /path/to/imagenet-sketch to test ImageNet-C, ImageNet-A, ImageNet-R or ImageNet-Sketch.
If you want to test the accuracy under adversarial attackers, please add --fgsm_test or --pgd_test.
Links
This codebase is built upon the official code of "Towards Robust Vision Transformer".
Citation
If you found this code helpful, please consider citing our work:
@article{shi2022visual,
title={Visual Attention Emerges from Recurrent Sparse Reconstruction},
author={Shi, Baifeng and Song, Yale and Joshi, Neel and Darrell, Trevor and Wang, Xin},
journal={arXiv preprint arXiv:2204.10962},
year={2022}
}
Related Skills
node-connect
346.4kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
107.2kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
346.4kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
346.4kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
