TiM

Transition Models

Generate Convert Improve

Install / Use

/learn @WZDTHU/TiM

About this skill

Quality Score

0/100

README

<h1 align="center">Transition Models: Rethinking the Generative Learning Objective</h1> <div align="center"> <a href="https://github.com/WZDTHU" target="_blank">ZiDong Wang</a>1,2,* &ensp; · &ensp; <a href="https://invictus717.github.io" target="_blank">Yiyuan Zhang</a>1,2,*,‡ &ensp; · &ensp; <a href="https://yuexy.github.io/" target="_blank">Xiaoyu Yue</a>2,3 &ensp; · &ensp; <a href="https://xyue.io" target="_blank">Xiangyu Yue</a>1 &ensp; · &ensp; <a href="https://yg256li.github.io" target="_blank">Yangguang Li</a>1,† &ensp; · &ensp; <a href="https://wlouyang.github.io" target="_blank">Wanli Ouyang</a>1,2 &ensp; · &ensp; <a href="http://leibai.site" target="_blank">Lei Bai</a>2,†

1 MMLab CUHK 2Shanghai AI Lab 3USYD *Equal Contribution ‡Project Lead †Corresponding Authors

</div> <h3 align="center">  [<a href="https://arxiv.org/abs/2509.04394">arXiv</a>]&emsp; [<a href="https://huggingface.co/GoodEnough/TiM-T2I">Model</a>]&emsp; [<a href="https://huggingface.co/datasets/GoodEnough/TiM-Toy-T2I-Dataset">Dataset</a>]&emsp; </h3>

Highlights: We propose Transition Models (TiM), a novel generative model that learns to navigate the entire generative trajectory with unprecedented flexibility.

Our Transition Models (TiM) are trained to master arbitrary state-to-state transitions. This approach allows TiM to learn the entire solution manifold of the generative process, unifying the few-step and many-step regimes within a single, powerful model.
Despite having only 865M parameters, TiM achieves state-of-the-art performance, surpassing leading models such as SD3.5 (8B parameters) and FLUX.1 (12B parameters) across all evaluated step counts on GenEval benchmark. Importantly, unlike previous few-step generators, TiM demonstrates monotonic quality improvement as the sampling budget increases.
Additionally, when employing our native-resolution strategy, TiM delivers exceptional fidelity at resolutions up to $4096\times4096$.

🚨 News

2025-9-5 We are delighted to introduce TiM, which is the first text-to-image generator support any-step generation, entirely trained from scratch. We have released the codes and pretrained models of TiM.

1. Setup

First, clone the repo:

git clone https://github.com/WZDTHU/TiM.git && cd TiM

1.1 Environment Setup

conda create -n tim_env python=3.10
conda activate tim_env
pip install torch==2.5.1 torchvision==0.20.1 --index-url https://download.pytorch.org/whl/cu118
pip install flash-attn
pip install -r requirements.txt
pip install -e .

1.2 Model Zoo (WIP)

Text-to-Image Generation

A single TiM model can perform any-step generation (one-step, few-step, and multi-step) and demonstrate monotonic quality improvement as the sampling budget increases. | Model | Model Zoo | Model Size | VAE | 1-NFE GenEval | 8-NFE GenEval | 128-NFE GenEval | |---------------|------------|---------|------------|-------|-------|-------| | TiM-T2I | 🤗 HF | 865M | DC-AE | 0.67 | 0.76 | 0.83 |

mkdir checkpoints
wget -c "https://huggingface.co/GoodEnough/TiM-T2I/resolve/main/t2i_model.bin" -O checkpoints/t2i_model.bin

Class-guided Image Generation:

| Model | Model Zoo | Model Size | VAE | 2-NFE FID | 500-NFE FID | |---------------|------------|---------|------------|------------|------------| | TiM-C2I-256 | 🤗 HF | 664M | SD-VAE | 6.14 | 1.65
| TiM-C2I-512 | 🤗 HF | 664M | DC-AE | 4.79 | 1.69

mkdir checkpoints
wget -c "https://huggingface.co/GoodEnough/TiM-C2I/resolve/main/c2i_model_256.safetensors" -O checkpoints/c2i_model_256.safetensors
wget -c "https://huggingface.co/GoodEnough/TiM-C2I/resolve/main/c2i_model_512.safetensors" -O checkpoints/c2i_model_512.safetensors

2. Sampling

Text-to-Image Generation

We provide the sampling scripts on three benchmarks: GenEval, DPGBench, and MJHQ30K. You can specify the sampling steps, resolutions, and CFG scale in the corresponding scripts.

Sampling with TiM-T2I model on GenEval benchmark:

bash scripts/sample/t2i/sample_t2i_geneval.sh

Sampling with TiM-T2I model on DPGBench benchmark:

bash scripts/sample/t2i/sample_t2i_dpgbench.sh

Sampling with TiM-T2I model on MJHQ30k benchmark:

bash scripts/sample/t2i/sample_t2i_mjhq30k.sh

Class-guided Image Generation

We provide the sampling scripts for ImageNet-256 and ImageNet-512.

Sampling with C2I model on $256\times256$ resolution:

bash scripts/sample/c2i/sample_256x256.sh

Sampling with C2I model on $512\times512$ resolution:

bash scripts/sample/c2i/sample_512x512.sh

3. Evaluation

Text-to-Image Generation

GenEval

Please follow the GenEval to setup the conda-environment.

Given the directory of the generated images SAMPLING_DIR and folder of object dector OBJECT_DETECTOR_FOLDER, run the following codes:

python projects/evaluate/geneval/evaluation/evaluate_images.py $SAMPLING_DIR --outfile geneval_results.jsonl --model-path $OBJECT_DETECTOR_FOLDER

This will result in a JSONL file with each line corresponding to an image. Run the following codes to obtain the GenEval Score:

python projects/evaluate/geneval/evaluation/summary_scores.py geneval_results.jsonl

DPGBench

Please follow the DPGBench to setup the conda-environment. Given the directory of the generated images SAMPLING_DIR , run the following codes:

python projects/evaluate/dpg_bench/compute_dpg_bench.py --image-root-path $SAMPLING_DIR --res-path dpgbench_results.txt --pic-num 4

MJHQ30K

Please download MJHQ30K as the reference-image.

Given the directory of the reference-image direcotry REFERENCE_DIR and the directory of the generated images SAMPLING_DIR, run the following codes to calculate the FID Score:

python projects/evaluate/mjhq30k/calculate_fid.py $REFERENCE_DIR $SAMPLING_DIR

For CLIP Score, first compute the text features and save it in MJHQ30K_TEXT_FEAT:

python projects/evaluate/mjhq30k/calculate_clip.py projects/evaluate/mjhq30k/meta_data.json $MJHQ30K_TEXT_FEAT/clip_feat.safetensors --save-stats

Then run the following codes to calculate the CLIP Score:

python projects/evaluate/mjhq30k/calculate_clip.py $MJHQ30K_TEXT_FEAT/clip_feat.safetensors $SAMPLING_DIR

Class-guided Image Generation

The sampling generates a folder of samples to compute FID, Inception Score and other metrics. Note that we do not pack the generate samples as a .npz file, this does not affect the calculation of FID and other metrics. Please follow the ADM's TensorFlow evaluation suite to setup the conda-environment and download the reference batch.

wget -c "https://openaipublic.blob.core.windows.net/diffusion/jul-2021/ref_batches/classify_image_graph_def.pb" -O checkpoints/classify_image_graph_def.pb

Given the directory of the reference batch REFERENCE_DIR and the directory of the generated images SAMPLING_DIR, run the following codes:

python projects/evaluate/adm_evaluator.py $REFERENCE_DIR $SAMPLING_DIR

4. Training

4.1 Dataset Setup

Currently, we provide all the preprocessed dataset for ImageNet1K. Please use the following commands to download the preprocessed latents.

bash tools/download_imagenet_256x256.sh
bash tools/download_imagenet_512x512.sh

For text-to-image generation, we provide a toy dataset. Please use the following command to download this dataset.

bash tools/download_toy_t2i_dataset.sh

4.2 Download Image Encoder

We use RADIO-v2.5-b as our image encoder for REPA-loss.

wget -c "https://huggingface.co/nvidia/RADIO/resolve/main/radio-v2.5-b_half.pth.tar" -O checkpoints/radio-v2.5-b_half.pth.tar

4.3 Training Scripts

Specify the image_dir in configs/c2i/tim_b_p4.yaml and train the base-model (131M) on ImageNet-256:

bash scripts/train/c2i/train_tim_c2i_b.sh

Specify the image_dir in configs/c2i/tim_xl_p2_256.yaml and train the XL-model (664M) on ImageNet-256:

bash scripts/train/c2i/train_tim_c2i_xl_256.sh

Specify the image_dir in configs/c2i/tim_xl_p2_512.yaml and train the XL-model (664M) on ImageNet-512:

bash scripts/train/c2i/train_tim_c2i_xl_512.sh

Specify the root_dir in configs/t2i/tim_xl_p1_t2i.yaml and train the T2I-model (865M) on Toy-T2I-Dataset:

bash scripts/train/t2i/train_tim_t2i.sh

Citations

If you find the project useful, please kindly cite:

@article{wang2025transition,
  title={Transition Models: Rethinking the Generative Learning Objective}, 
  author={Wang, Zidong and Zhang, Yiyuan and Yue, X

Related Skills

node-connect

352.2k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

111.1k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

352.2k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

qqbot-media

352.2k

QQBot 富媒体收发能力。使用 <qqmedia> 标签，系统根据文件扩展名自动识别类型（图片/语音/视频/文件）。