TDM
[ICCV 2025][Few-Step Student Surpasses Teacher Diffusion] Learning Few-Step Diffusion Models by Trajectory Distribution Matching
Install / Use
/learn @Luo-Yihong/TDMREADME
TDM: Learning Few-Step Diffusion Models by Trajectory Distribution Matching
<div align="center"> <a href="https://tdm-t2x.github.io/"><img src="https://img.shields.io/static/v1?label=Project%20Page&message=Github-Page&color=blue&logo=github-pages"></a>   <a href="https://arxiv.org/abs/2503.06674"><img src="https://img.shields.io/static/v1?label=Paper&message=ICCV:TDM&color=red&logo=ICCV"></a>   </div>This is the Official Repository of "Learning Few-Step Diffusion Models by Trajectory Distribution Matching", by Yihong Luo, Tianyang Hu, Jiacheng Sun, Yujun Cai, Jing Tang.
🚀🚀🚀[Few-Step Student Surpasses the Teacher Diffusion Model in an Image/Video-Free Way!]
🔥News
- (2026/03) We release pre-trained TDM LoRA for SD-3.5M.
- (2025/07) We release the most compact demo of training TDM on PixArt-512.
- (2025/06) TDM is accepted to ICCV 2025 🎉!
User Study Time!
Which one do you think is better? Some images are generated by Pixart-α (50 NFE). Some images are generated by TDM (4 NFE), distilling from Pixart-α in a data-free way with merely 500 training iterations and 2 A800 hours.
Fast Text-to-Video Geneartion
Our proposed TDM can be easily extended to text-to-video.
<p align="center"> <img src="assets/teacher.gif" alt="Teacher" width="45%"> <img src="assets/student.gif" alt="Student" width="45%"> </p>The video on the left was generated by CogVideoX-2B (100 NFE). In the same amount of time, TDM (4NFE) can generate 25 videos, as shown on the right, achieving an impressive 25 times speedup without performance degradation. (Note: The noise in the GIF is due to compression.)
Usage
TDM-SD3-LoRA
import torch
from diffusers import StableDiffusion3Pipeline, AutoencoderTiny, DPMSolverMultistepScheduler
from huggingface_hub import hf_hub_download
from safetensors.torch import load_file
from diffusers.utils import make_image_grid
pipe = StableDiffusion3Pipeline.from_pretrained("stabilityai/stable-diffusion-3-medium-diffusers", torch_dtype=torch.float16).to("cuda")
pipe.load_lora_weights('Luo-Yihong/TDM_sd3_lora', adapter_name = 'tdm') # Load TDM-LoRA
pipe.set_adapters(["tdm"], [0.125])# IMPORTANT. Please set LoRA scale to 0.125.
pipe.vae = AutoencoderTiny.from_pretrained("madebyollin/taesd3", torch_dtype=torch.float16) # Save GPU memory.
pipe.vae.config.shift_factor = 0.0
pipe = pipe.to("cuda")
pipe.scheduler = DPMSolverMultistepScheduler.from_pretrained("Efficient-Large-Model/Sana_1600M_1024px_BF16_diffusers", subfolder="scheduler")
pipe.scheduler.config['flow_shift'] = 6 # the flow_shift can be changed from 1 to 6.
pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config)
generator = torch.manual_seed(8888)
image = pipe(
prompt="A cute panda holding a sign says TDM SOTA!",
negative_prompt="",
num_inference_steps=4,
height=1024,
width=1024,
num_images_per_prompt = 1,
guidance_scale=1.,
generator = generator,
).images[0]
pipe.scheduler = DPMSolverMultistepScheduler.from_pretrained("Efficient-Large-Model/Sana_1600M_1024px_BF16_diffusers", subfolder="scheduler")
pipe.set_adapters(["tdm"], [0.]) # Unload lora
generator = torch.manual_seed(8888)
teacher_image = pipe(
prompt="A cute panda holding a sign says TDM SOTA!",
negative_prompt="",
num_inference_steps=28,
height=1024,
width=1024,
num_images_per_prompt = 1,
guidance_scale=7.,
generator = generator,
).images[0]
make_image_grid([image,teacher_image],1,2)
The sample generated by SD3 with 56 NFE is on the right, and the sample generated by TDM with 4NFE is on the left. Which one do you feel is better?
TDM-Dreamshaper-v7-LoRA
import torch
from diffusers import DiffusionPipeline, UNet2DConditionModel, DPMSolverMultistepScheduler
from huggingface_hub import hf_hub_download
from safetensors.torch import load_file
repo_name = "Luo-Yihong/TDM_dreamshaper_LoRA"
ckpt_name = "tdm_dreamshaper.pt"
pipe = DiffusionPipeline.from_pretrained('lykon/dreamshaper-7', torch_dtype=torch.float16).to("cuda")
pipe.load_lora_weights(hf_hub_download(repo_name, ckpt_name))
pipe.scheduler = DPMSolverMultistepScheduler.from_pretrained("runwayml/stable-diffusion-v1-5", subfolder="scheduler")
generator = torch.manual_seed(317)
image = pipe(
prompt="A close-up photo of an Asian lady with sunglasses",
negative_prompt="",
num_inference_steps=4,
num_images_per_prompt = 1,
generator = generator,
guidance_scale=1.,
).images[0]
image

TDM-CogVideoX-2B-LoRA
import torch
from diffusers import CogVideoXPipeline
from diffusers.utils import export_to_video
pipe = CogVideoXPipeline.from_pretrained("THUDM/CogVideoX-2b", torch_dtype=torch.float16)
pipe.vae.enable_slicing() # Save memory
pipe.vae.enable_tiling() # Save memory
pipe.load_lora_weights("Luo-Yihong/TDM_CogVideoX-2B_LoRA")
pipe.to("cuda")
prompt = (
"A panda, dressed in a small, red jacket and a tiny hat, sits on a wooden stool in a serene bamboo forest. The "
"panda's fluffy paws strum a miniature acoustic guitar, producing soft, melodic tunes. Nearby, a few other "
"pandas gather, watching curiously and some clapping in rhythm. Sunlight filters through the tall bamboo, "
"casting a gentle glow on the scene. The panda's face is expressive, showing concentration and joy as it plays. "
"The background includes a small, flowing stream and vibrant green foliage, enhancing the peaceful and magical "
"atmosphere of this unique musical performance"
)
# We train the generator on timesteps [999, 856, 665, 399].
# The official scheduler of CogVideo-X using uniform spacing, may cause inferior results.
# But TDM-LoRA still works well under 4 NFE.
# We will update the TDM-CogVideoX-LoRA soon for better performance!
generator = torch.manual_seed(8888)
frames = pipe(prompt, guidance_scale=1,
num_inference_steps=4,
num_frames=49,
generator = generator).frames[0]
export_to_video(frames, "output-TDM.mp4", fps=8)
🔥 Pre-trained Models
We release a bucket of TDM-LoRA. Please enjoy it!
Training
Prepare data
The training of TDM is image-free, we just need prompts. You may obtain prompts from open-source datasets, such as JourneyDB.
Run the script for training
You can run the training script as follows:
accelerate launch \
--main_process_port 29503 \
--num_processes=2 \
--mixed_precision=fp16 \
train_tdm_demo.py \
--train_batch_size=$bsz \
--gradient_accumulation_steps=1 \
--gradient_checkpointing \
--max_train_steps=10001 \
--learning_rate=2e-05 \
--max_grad_norm=1 \
--enable_xformers_memory_efficient_attention \
--use_8bit_adam \
--cfg 4.5 \
--total_steps 900 \
--lr_scheduler cosine_with_restarts \
--lr_warmup_steps 50 \
--use_huber \
--use_separate
We suggest two modes of adding noise:
- t ~ [t_k, t_{k+1}]. This fully separate diffusing interval among steps.
- t ~ [t_k, T]. We suggest add steps as a condition in fake score in this approach. However, without steps as condition can also work well for TDM. This is because the last diffusing interval [0, t_1] does not shared among steps in TDM's case.
Contact
Please contact Yihong Luo (yluocg@connect.ust.hk) if you have any questions about this work.
Bibtex
@misc{luo2025tdm,
title={Learning Few-Step Diffusion Models by Trajectory Distribution Matching},
author={Yihong Luo and Tianyang Hu and Jiacheng Sun and Yujun Cai and Jing Tang},
year={2025},
eprint={2503.06674},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2503.06674},
}
Related Skills
YC-Killer
2.7kA library of enterprise-grade AI agents designed to democratize artificial intelligence and provide free, open-source alternatives to overvalued Y Combinator startups. If you are excited about democratizing AI access & AI agents, please star ⭐️ this repository and use the link in the readme to join our open source AI research team.
flutter-tutor
Flutter Learning Tutor Guide You are a friendly computer science tutor specializing in Flutter development. Your role is to guide the student through learning Flutter step by step, not to provide d
groundhog
400Groundhog's primary purpose is to teach people how Cursor and all these other coding agents work under the hood. If you understand how these coding assistants work from first principles, then you can drive these tools harder (or perhaps make your own!).
workshop-rules
Materials used to teach the summer camp <Data Science for Kids>
