TiM
Transition Models
Install / Use
/learn @WZDTHU/TiMREADME
<sup>1</sup> MMLab CUHK <sup>2</sup>Shanghai AI Lab <sup>3</sup>USYD <br> <sup>*</sup>Equal Contribution <sup>‡</sup>Project Lead <sup>†</sup>Corresponding Authors <br>
</div> <h3 align="center"> <!-- [<a href="https://wzdthu.github.io/NiT">project page</a>]  --> [<a href="https://arxiv.org/abs/2509.04394">arXiv</a>]  [<a href="https://huggingface.co/GoodEnough/TiM-T2I">Model</a>]  [<a href="https://huggingface.co/datasets/GoodEnough/TiM-Toy-T2I-Dataset">Dataset</a>]  </h3> <br><b>Highlights</b>: We propose Transition Models (TiM), a novel generative model that learns to navigate the entire generative trajectory with unprecedented flexibility.
- Our Transition Models (TiM) are trained to master arbitrary state-to-state transitions. This approach allows TiM to learn the entire solution manifold of the generative process, unifying the few-step and many-step regimes within a single, powerful model.

- Despite having only 865M parameters, TiM achieves state-of-the-art performance, surpassing leading models such as SD3.5 (8B parameters) and FLUX.1 (12B parameters) across all evaluated step counts on GenEval benchmark. Importantly, unlike previous few-step generators, TiM demonstrates monotonic quality improvement as the sampling budget increases.

- Additionally, when employing our native-resolution strategy, TiM delivers exceptional fidelity at resolutions up to $4096\times4096$.

🚨 News
2025-9-5We are delighted to introduce TiM, which is the first text-to-image generator support any-step generation, entirely trained from scratch. We have released the codes and pretrained models of TiM.
1. Setup
First, clone the repo:
git clone https://github.com/WZDTHU/TiM.git && cd TiM
1.1 Environment Setup
conda create -n tim_env python=3.10
conda activate tim_env
pip install torch==2.5.1 torchvision==0.20.1 --index-url https://download.pytorch.org/whl/cu118
pip install flash-attn
pip install -r requirements.txt
pip install -e .
1.2 Model Zoo (WIP)
Text-to-Image Generation
A single TiM model can perform any-step generation (one-step, few-step, and multi-step) and demonstrate monotonic quality improvement as the sampling budget increases. | Model | Model Zoo | Model Size | VAE | 1-NFE GenEval | 8-NFE GenEval | 128-NFE GenEval | |---------------|------------|---------|------------|-------|-------|-------| | TiM-T2I | 🤗 HF | 865M | DC-AE | 0.67 | 0.76 | 0.83 |
mkdir checkpoints
wget -c "https://huggingface.co/GoodEnough/TiM-T2I/resolve/main/t2i_model.bin" -O checkpoints/t2i_model.bin
Class-guided Image Generation:
| Model | Model Zoo | Model Size | VAE | 2-NFE FID | 500-NFE FID |
|---------------|------------|---------|------------|------------|------------|
| TiM-C2I-256 | 🤗 HF | 664M | SD-VAE | 6.14 | 1.65
| TiM-C2I-512 | 🤗 HF | 664M | DC-AE | 4.79 | 1.69
mkdir checkpoints
wget -c "https://huggingface.co/GoodEnough/TiM-C2I/resolve/main/c2i_model_256.safetensors" -O checkpoints/c2i_model_256.safetensors
wget -c "https://huggingface.co/GoodEnough/TiM-C2I/resolve/main/c2i_model_512.safetensors" -O checkpoints/c2i_model_512.safetensors
2. Sampling
Text-to-Image Generation
We provide the sampling scripts on three benchmarks: GenEval, DPGBench, and MJHQ30K. You can specify the sampling steps, resolutions, and CFG scale in the corresponding scripts.
Sampling with TiM-T2I model on GenEval benchmark:
bash scripts/sample/t2i/sample_t2i_geneval.sh
Sampling with TiM-T2I model on DPGBench benchmark:
bash scripts/sample/t2i/sample_t2i_dpgbench.sh
Sampling with TiM-T2I model on MJHQ30k benchmark:
bash scripts/sample/t2i/sample_t2i_mjhq30k.sh
Class-guided Image Generation
We provide the sampling scripts for ImageNet-256 and ImageNet-512.
Sampling with C2I model on $256\times256$ resolution:
bash scripts/sample/c2i/sample_256x256.sh
Sampling with C2I model on $512\times512$ resolution:
bash scripts/sample/c2i/sample_512x512.sh
3. Evaluation
Text-to-Image Generation
GenEval
Please follow the GenEval to setup the conda-environment.
Given the directory of the generated images SAMPLING_DIR and folder of object dector OBJECT_DETECTOR_FOLDER, run the following codes:
python projects/evaluate/geneval/evaluation/evaluate_images.py $SAMPLING_DIR --outfile geneval_results.jsonl --model-path $OBJECT_DETECTOR_FOLDER
This will result in a JSONL file with each line corresponding to an image. Run the following codes to obtain the GenEval Score:
python projects/evaluate/geneval/evaluation/summary_scores.py geneval_results.jsonl
DPGBench
Please follow the DPGBench to setup the conda-environment.
Given the directory of the generated images SAMPLING_DIR , run the following codes:
python projects/evaluate/dpg_bench/compute_dpg_bench.py --image-root-path $SAMPLING_DIR --res-path dpgbench_results.txt --pic-num 4
MJHQ30K
Please download MJHQ30K as the reference-image.
Given the directory of the reference-image direcotry REFERENCE_DIR and the directory of the generated images SAMPLING_DIR, run the following codes to calculate the FID Score:
python projects/evaluate/mjhq30k/calculate_fid.py $REFERENCE_DIR $SAMPLING_DIR
For CLIP Score, first compute the text features and save it in MJHQ30K_TEXT_FEAT:
python projects/evaluate/mjhq30k/calculate_clip.py projects/evaluate/mjhq30k/meta_data.json $MJHQ30K_TEXT_FEAT/clip_feat.safetensors --save-stats
Then run the following codes to calculate the CLIP Score:
python projects/evaluate/mjhq30k/calculate_clip.py $MJHQ30K_TEXT_FEAT/clip_feat.safetensors $SAMPLING_DIR
Class-guided Image Generation
The sampling generates a folder of samples to compute FID, Inception Score and other metrics.
<b>Note that we do not pack the generate samples as a .npz file, this does not affect the calculation of FID and other metrics.</b>
Please follow the ADM's TensorFlow
evaluation suite
to setup the conda-environment and download the reference batch.
wget -c "https://openaipublic.blob.core.windows.net/diffusion/jul-2021/ref_batches/classify_image_graph_def.pb" -O checkpoints/classify_image_graph_def.pb
Given the directory of the reference batch REFERENCE_DIR and the directory of the generated images SAMPLING_DIR, run the following codes:
python projects/evaluate/adm_evaluator.py $REFERENCE_DIR $SAMPLING_DIR
4. Training
4.1 Dataset Setup
Currently, we provide all the preprocessed dataset for ImageNet1K. Please use the following commands to download the preprocessed latents.
bash tools/download_imagenet_256x256.sh
bash tools/download_imagenet_512x512.sh
For text-to-image generation, we provide a toy dataset. Please use the following command to download this dataset.
bash tools/download_toy_t2i_dataset.sh
4.2 Download Image Encoder
We use RADIO-v2.5-b as our image encoder for REPA-loss.
wget -c "https://huggingface.co/nvidia/RADIO/resolve/main/radio-v2.5-b_half.pth.tar" -O checkpoints/radio-v2.5-b_half.pth.tar
4.3 Training Scripts
Specify the image_dir in configs/c2i/tim_b_p4.yaml and train the base-model (131M) on ImageNet-256:
bash scripts/train/c2i/train_tim_c2i_b.sh
Specify the image_dir in configs/c2i/tim_xl_p2_256.yaml and train the XL-model (664M) on ImageNet-256:
bash scripts/train/c2i/train_tim_c2i_xl_256.sh
Specify the image_dir in configs/c2i/tim_xl_p2_512.yaml and train the XL-model (664M) on ImageNet-512:
bash scripts/train/c2i/train_tim_c2i_xl_512.sh
Specify the root_dir in configs/t2i/tim_xl_p1_t2i.yaml and train the T2I-model (865M) on Toy-T2I-Dataset:
bash scripts/train/t2i/train_tim_t2i.sh
Citations
If you find the project useful, please kindly cite:
@article{wang2025transition,
title={Transition Models: Rethinking the Generative Learning Objective},
author={Wang, Zidong and Zhang, Yiyuan and Yue, X
Related Skills
node-connect
352.2kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
111.1kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
352.2kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
352.2kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
