RePaViT
This is the official code for paper [RePaViT: Scalable Vision Transformer Acceleration via Structural Reparameterization on Feedforward Network Layers (ICML2025)]
Install / Use
/learn @Ackesnal/RePaViTREADME
RePaViT: Scalable Vision Transformer Acceleration via Structural Reparameterization on Feedforward Network Layers [ICML2025] 
This is the official repository for RePaViT
<p align="center"> <img src="img/Method_Compare.jpg" alt="Main Method" width="60%"/> </p>(For RePa-LV-ViT source code, please refer to this repo as LV-ViT incorporates a different training framework. For dense prediction tasks, the code based on MMDetection and MMSegmentation is under construction. Pretrained model weights have been released here.)
0. Environment Setup
First, clone the repository locally:
git clone https://github.com/Ackesnal/RePaViT.git
cd RePaViT
Then, install environments via conda:
conda create -n repavit python=3.10 -y && conda activate repavit
conda install conda-forge::python-rocksdb -y
pip install torch torchvision torchaudio timm==1.0.3 einops ptflops wandb
[Recommended] Alternatively, you can directly install from the pre-defined environment YAML file as:
conda env create -f environment.yml
After finishing the above installations, it is ready to run this repo.
We further utilize the wandb for real-time tracking and training process visualization. The use of wandb is optional. However, you will need to register and login to wandb before using this functionality.
1. Dataset Preparation
Download and extract ImageNet train and val images from http://image-net.org/.
The directory structure is the standard layout for the torchvision datasets.ImageFolder, and the training and validation data is expected to be in the train/ folder and val folder respectively:
/path/to/imagenet/
train/
class1/
img1.jpeg
class2/
img2.jpeg
val/
class1/
img3.jpeg
class2/
img4.jpeg
We provide support for RocksDB as an alternative dataset organization solution. In certain HPC environments where the number of allowable files is limited, the ImageNet dataset cannot be fully decompressed on high-speed I/O disks. In this case, RocksDB enables efficient and stable ImageNet data storing and loading, without the need for millions of small image files.
To insert ImageNet into a RocksDB database, simply run
python insert_rocksdb.py
(please replace tar_path_root and db_path_root in insert_rocksdb.py with your own source and target root paths).
When training the model, use the --rocksdb argument instead of --data_path to specify the database location.
2. Training
2.1. Training on a single node
To train RePaViT on ImageNet on a single node with 8 gpus for 300 epochs without wandb logging, please refer to the command examples below.
[RePaViT-Base]:
torchrun --nproc_per_node=8 main.py \
--model=RePaViT_Base \
--batch_size=512 \
--epochs=300 \
--dist_eval \
--channel_idle \
--idle_ratio=0.75 \
--feature_norm=BatchNorm \
--data_path=/path/to/imagenet \
--output_dir=/path/to/output \
--lr=4e-3 \
--min_lr=4e-5 \
--warmup_lr=1e-6 \
--warmup_epochs=20 \
--unscale_lr \
--weight_decay=0.05 \
--opt=lamb \
--drop_path=0.1
[RePaViT-Large]:
torchrun --nproc_per_node=8 main.py \
--model=RePaViT_Large \
--batch_size=512 \
--epochs=300 \
--dist_eval \
--channel_idle \
--idle_ratio=0.75 \
--feature_norm=BatchNorm \
--data_path=/path/to/imagenet \
--output_dir=/path/to/output \
--lr=1e-3 \
--min_lr=5e-5 \
--warmup_lr=1e-6 \
--warmup_epochs=20 \
--unscale_lr \
--weight_decay=0.05 \
--opt=lamb \
--drop_path=0.3
--channel_idle and --idle_ratio=0.75 are used to control channel idle mechanism in FFN layers. Please note that --feature_norm=BatchNorm must be added to facilitate full structural reparameterization.
If the computating resource is limited, you can add --accumulation_steps for training with a smaller batch size and gradient accumulation. (--accumulation_steps $\times$ --batch_size $\times$ --nproc_per_node) is the total batch size per batch.
For your convenience, we also provide one-line command below:
torchrun --nproc_per_node=8 main.py --model=RePaViT_Large --batch_size=512 --epochs=300 --dist_eval --channel_idle --idle_ratio=0.75 --feature_norm=BatchNorm --data_path=/path/to/imagenet --output_dir=/path/to/output --lr=1e-3 --min_lr=5e-5 --warmup_lr=1e-6 --warmup_epochs=20 --unscale_lr --weight_decay=0.05 --opt=lamb --drop_path=0.3
2.2. Track your training with wandb
To train with wandb tracking and visualization, --wandb argument with environment variable WANDB_MODE should be set. The project name is set to the model name by default. In addition, --wandb_suffix can be used to nominate a customized suffix for distinguishing different projects on the same model.
[RePaViT-Large] with wandb:
WANDB_MODE=online torchrun --nproc_per_node=8 main.py \
--model=RePaViT_Large \
--batch_size=512 \
--epochs=300 \
--dist_eval \
--channel_idle \
--idle_ratio=0.75 \
--feature_norm=BatchNorm \
--data_path=/path/to/imagenet \
--output_dir=/path/to/output \
--lr=1e-3 \
--min_lr=5e-5 \
--warmup_lr=1e-6 \
--warmup_epochs=20 \
--unscale_lr \
--weight_decay=0.05 \
--opt=lamb \
--drop_path=0.3 \
--wandb
#--wandb_entity=your-entity-name \
#--wandb_suffix=your-customized-suffix
Please note that WANDB_MODE MUST be set when using --use_wandb. You can choose WANDB_MODE=online for real-time tracking on the wandb dashboard, or WANDB_MODE=offline for local tracking and synchronize later.
2.3. Training on multiple nodes
Distributed multi-node multi-GPU training is available via Slurm. We provide a sample Slurm script at exec_config.sh.
A sample code snippet of exec_config.sh is as shown below:
#!/bin/bash
#SBATCH --nodes=8
#SBATCH --ntasks-per-node=2
#SBATCH --partition=gpu
#SBATCH --gres=gpu:2
#SBATCH --cpus-per-task=32
#SBATCH --mem-per-cpu=2G
#SBATCH --job-name=train
#SBATCH --time=1-00:00:00
#SBATCH -o RePaViT_Large_out.txt
#SBATCH -e RePaViT_Large_err.txt
# Load modules if needed
# e.g., `module load miniconda3`
# Activate conda environment if needed
# e.g., `conda activate repavit`
export BATCH_SIZE=4096
export MASTER_PORT=22222
export MASTER_ADDR=$(scontrol show hostnames $SLURM_JOB_NODELIST | head -n 1)
export WORLD_SIZE=$SLURM_NTASKS
export BATCH_SIZE=$(echo "scale=0; $BATCH_SIZE / $WORLD_SIZE" | bc)
export WANDB_MODE=online
srun --export=ALL python main.py \
--model=RePaViT_Large \
--batch_size=$BATCH_SIZE \
--epochs=300 \
--num_workers=20 \
--dist_eval \
--channel_idle \
--idle_ratio=0.75 \
--feature_norm=BatchNorm \
--data_path=/path/to/imagenet \
--output_dir=/path/to/output \
--opt=lamb \
--lr=1e-3 \
--min_lr=5e-5 \
--warmup_lr=1e-6 \
--warmup_epochs=20 \
--unscale_lr \
--weight_decay=0.05 \
--drop_path=0.3 \
--wandb
#--wandb_entity=your-entity-name \
#--wandb_suffix=your-customized-suffix
where --nodes and --gres determines how many SLURM nodes and how many GPUs on each node you want to use. --gres should equal to --ntasks-per-node. The batch size of each parallel process will be automatically calculated based on the world size.
3. Evaluation
3.1. Accuracy evaluation
To evaluate the prediction performance, please run the following code. Please ensure --idle_ratio is set to the same value as the pretrained model weight.
[RePaViT-Large] performance test:
torchrun --nproc_per_node=4 main.py \
--model=RePaViT_Large \
--batch_size=512 \
--eval \
--dist_eval \
--channel_idle \
--idle_ratio=0.75 \
--feature_norm=BatchNorm \
--data_path=/path/to/imagenet \
--resume=/path/to/pretrained_weight.pth
For your convenience, we also provide one-line command below:
torchrun --nproc_per_node=4 main.py --model=RePaViT_Large --batch_size=512 --eval --dist_eval --channel_idle --idle_ratio=0.75 --feature_norm=BatchNorm --data_path=/path/to/imagenet --resume=/path/to/pretrained_weight.pth
3.2. Inference speed test
To test inference speed, --test_speed and --only_test_speed arguments should be utilized, and the number of processes is recommended to set to 1:
[RePaViT-Large] speed test:
torchrun --nproc_per_node=1 main.py \
--model=RePaViT_Large \
--channel_idle \
--idle_ratio=0.75 \
--feature_norm=BatchNorm \
--test_speed
For your convenience, we also provide one-line command below:
torchrun --nproc_per_node=1 main.py --model=RePaViT_Large --channel_idle --idle_ratio=0.75 --feature_norm=BatchNorm --test_speed
3.3. Evaluation with Structural Reparameterization
To enable inference-stage model compression via structural reparameterization, you can simply add the argument --reparam as:
[RePaViT-Large] speed test after structural reparameterization:
torchrun --nproc_per_node=1 main.py \
--model=RePaViT_Large \
--channel_idle \
--idle_ratio=0.75 \
--feature_norm=BatchNorm \
--test_speed \
--reparam
For your convenience, we also provide one-line command below:
torchrun --nproc_per_node=1 main.py --model=RePaViT_Large --channel_idle --idle_ratio=0.75 --feature_norm=BatchNorm --test_speed --reparam
--reparam can be combined with performance evalutation as well. The prediction accuracy before and after reparameterization should be the same.
4. Supported Models
In this repo, we currently support the following backbone model(name)s:
- RePaViT-Tiny (i.e., RePaDeiT-Tiny)
- RePaViT-Small (i.e., RePaDeiT-Small)
- RePaViT-Base (i.e., RePaDeiT-Base)
Related Skills
node-connect
349.7kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
109.7kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
349.7kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
349.7kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
