TransFG
This is the official PyTorch implementation of the paper "TransFG: A Transformer Architecture for Fine-grained Recognition" (Ju He, Jie-Neng Chen, Shuai Liu, Adam Kortylewski, Cheng Yang, Yutong Bai, Changhu Wang, Alan Yuille).
Install / Use
/learn @TACJu/TransFGREADME
TransFG: A Transformer Architecture for Fine-grained Recognition
Official PyTorch code for the paper: TransFG: A Transformer Architecture for Fine-grained Recognition (AAAI2022)
Framework

Dependencies:
- Python 3.7.3
- PyTorch 1.5.1
- torchvision 0.6.1
- ml_collections
Usage
1. Download Google pre-trained ViT models
- Get models in this link: ViT-B_16, ViT-B_32...
wget https://storage.googleapis.com/vit_models/imagenet21k/{MODEL_NAME}.npz
2. Prepare data
In the paper, we use data from 5 publicly available datasets:
Please download them from the official websites and put them in the corresponding folders.
3. Install required packages
Install dependencies with the following command:
pip3 install -r requirements.txt
4. Train
To train TransFG on CUB-200-2011 dataset with 4 gpus in FP-16 mode for 10000 steps run:
CUDA_VISIBLE_DEVICES=0,1,2,3 python3 -m torch.distributed.launch --nproc_per_node=4 train.py --dataset CUB_200_2011 --split overlap --num_steps 10000 --fp16 --name sample_run
Citation
If you find our work helpful in your research, please cite it as:
@article{he2021transfg,
title={TransFG: A Transformer Architecture for Fine-grained Recognition},
author={He, Ju and Chen, Jie-Neng and Liu, Shuai and Kortylewski, Adam and Yang, Cheng and Bai, Yutong and Wang, Changhu and Yuille, Alan},
journal={arXiv preprint arXiv:2103.07976},
year={2021}
}
Acknowledgement
Many thanks to ViT-pytorch for the PyTorch reimplementation of An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Related Skills
node-connect
341.0kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
84.4kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
341.0kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
commit-push-pr
84.4kCommit, push, and open a PR
