SkillAgentSearch skills...

SpeedrunDiT

SR-DiT Speedrunning ImageNet Diffusion

Install / Use

/learn @SwayStar123/SpeedrunDiT
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

<img width="2533" height="1688" alt="image" src="https://github.com/user-attachments/assets/24b1596b-2cd7-4c82-b056-889df7c0c231" /> Images generated by SR-DiT-B/1 (140M parameter diffusion model) with 400k training steps

SpeedrunDiT (SR-DiT): Speedrunning ImageNet Diffusion

This repository contains the reference implementation for SR-DiT (Speedrun Diffusion Transformer), a framework that combines representation alignment (REG-style), token routing (SPRINT), architectural improvements, and training modifications on top of a SiT-B/1 backbone with the INVAE tokenizer.

Links:

  • Code: https://github.com/SwayStar123/SpeedrunDiT
  • Checkpoints: https://huggingface.co/SwayStar123/SpeedrunDiT/tree/main
  • W&B runs: https://wandb.ai/kagaku-ai/REG/
  • Ablations (branches): https://github.com/SwayStar123/REG

Highlights

  • ImageNet-256 (400K iters, no CFG): FID 3.49, KDD 0.319, 140M params, sampling at NFE=250
  • ImageNet-512 (400K iters, no CFG): FID 4.23, KDD 0.306, sampling at NFE=250

SR-DiT builds on top of a strong baseline (REG + INVAE) and then progressively adds:

  • Semantic latent space via E2E-INVAE
  • SPRINT token routing
  • RMSNorm, RoPE, QK normalization, value residual learning
  • Contrastive Flow Matching (CFM)
  • Time shifting and balanced label sampling (for evaluation)

Repository layout

  • train.py: training loop (Accelerate)
  • generate.py: multi-GPU sampling to .png and .npz
  • evaluations/evaluator.py: computes FID/sFID/IS/Precision/Recall from .npz
  • preprocessing/dataset_tools.py: ImageNet preprocessing + INVAE encoding
  • train.sh, eval.sh: example scripts used for our runs

Setup

Create an environment (python 3.11) and install dependencies:

pip install -r requirements.txt

Dataset

Training expects a directory (passed via --data-dir) containing:

dataset/
  images/            # preprocessed ImageNet images (256x256 or 512x512)
  vae-in/            # INVAE latents (.npy) + dataset.json labels

Follow the preprocessing guide in preprocessing/README.md. The minimal flow is:

# 1) Convert raw ImageNet to resized/cropped PNG dataset
python preprocessing/dataset_tools.py convert --source /path/to/imagenet/train \
  --dest dataset/images --resolution=256x256 --transform=center-crop-dhariwal

# 2) Encode images to INVAE latents
python preprocessing/dataset_tools.py encode --source dataset/images \
  --dest dataset/vae-in

Preprocessed dataset is also uploaded here:

https://huggingface.co/datasets/SwayStar123/repa-imagenet-256/blob/main/dataset.zip
https://huggingface.co/datasets/SwayStar123/repa-imagenet-256/blob/main/vae-in.zip

You must first unzip the dataset.zip file, and then unzip the vae-in.zip inside the newly created dataset folder

Training

An example command is provided in train.sh:

bash train.sh

Key arguments:

  • --model: use SiT-B/1 for the SR-DiT-B/1 configuration
  • --data-dir: directory containing images/ and vae-in/
  • --qk-norm: enables QK normalization
  • --cfm-coeff, --cfm-weighting: CFM settings
  • --time-shifting, --shift-base: time shifting for training

Checkpoints are written to:

exps/<exp-name>/checkpoints/<step>.pt

Sampling and evaluation

eval.sh runs sampling (generate.py) and then computes metrics (evaluations/evaluator.py).

bash eval.sh

Notes:

  • generate.py currently supports --mode sde (the ode branch is not implemented).
  • For metric computation, download the matching reference batch listed in evaluations/README.md.
  • Balanced label sampling can be enabled via --balanced-sampling when generating samples.

Citation

If you use this repository, please cite SR-DiT:

@misc{bhanded2025speedrundit,
  title         = {Speedrunning ImageNet Diffusion},
  author        = {Bhanded, Swayam},
  year          = {2025},
  eprint        = {2512.12386},
  archivePrefix = {arXiv},
  primaryClass  = {cs.CV},
  url           = {https://arxiv.org/abs/2512.12386},
}

Contact

Please open a GitHub issue for any questions or issues.

Acknowledgements

This codebase builds upon:

  • REG / REPA
  • SiT
  • DINOv2
  • ADM evaluations
  • NVLabs edm2 preprocessing utilities

We gratefully acknowledge support from WayfarerLabs (Open World Labs) for sponsoring compute resources used in this work.

View on GitHub
GitHub Stars132
CategoryDevelopment
Updated4d ago
Forks1

Languages

Python

Security Score

95/100

Audited on Mar 24, 2026

No findings