SkillAgentSearch skills...

TurboDiffusion

TurboDiffusion: 100–200× Acceleration for Video Diffusion Models

Install / Use

/learn @thu-ml/TurboDiffusion

README

TurboDiffusion

<div align="center"> <img src=assets/TurboDiffusion_Logo.png width="30%"/> </div>

This repository provides the official implementation of TurboDiffusion, a video generation acceleration framework that can speed up end-to-end diffusion generation by $100 \sim 200\times$ on a single RTX 5090, while maintaining video quality.
TurboDiffusion primarily uses SageAttention, SLA (Sparse-Linear Attention) for attention acceleration, and rCM for timestep distillation.

Paper: TurboDiffusion: Accelerating Video Diffusion Models by 100-200 Times

Note: The current models are only trained on long English prompts. If you use other types of prompts, please augment them to get better performance.

The checkpoints and paper are not finalized, and will be updated later to improve quality.

<div align="center"> <img src="assets/TurboDiffusion_speedup.png" width="99%"/> </div> <div align="center"> <img src="assets/acceleration_decomposition.png" width="93%"/> </div> <div align="center"> <table> <tr> <td align="center" style="border: 2px solid #000; padding: 10px;"> <div style="font-size: 1.1em;">Original, E2E Time: 184s</div> <div><img src="assets/videos/original/1.3B/11.gif" width="387"/></div> </td> <td align="center" style="border: 2px solid #000; padding: 10px;"> <div style="font-size: 1.1em;">TurboDiffusion, E2E Time: <b>1.9s</b></div> <div><img src="assets/videos/turbodiffusion/1.3B/11.gif" width="387"/></div> </td> </tr> </table> An example of a <b>5-second video</b> generated by Wan-2.1-T2V-1.3B-480P on a single <b>RTX 5090</b>. </div>

Available Models

| Model Name | Checkpoint Link | Best Resolution | | :-----------------------------------: | :----------------------------------------------------------: | :-------------: | | TurboWan2.2-I2V-A14B-720P | Huggingface Model | 720p | | TurboWan2.1-T2V-1.3B-480P | Huggingface Model | 480p | | TurboWan2.1-T2V-14B-480P | Huggingface Model | 480p | | TurboWan2.1-T2V-14B-720P | Huggingface Model | 720p |

Note: All checkpoints support generating videos at 480p or 720p. The "Best Resolution" column indicates the resolution at which the model provides the best video quality.

Installation

Base environment: python>=3.9, torch>=2.7.0. torch==2.8.0 is recommended, as higher versions may cause OOM.

Install TurboDiffusion by pip:

conda create -n turbodiffusion python=3.12
conda activate turbodiffusion

pip install turbodiffusion --no-build-isolation

Or compile from source:

git clone https://github.com/thu-ml/TurboDiffusion.git
cd TurboDiffusion
git submodule update --init --recursive
pip install -e . --no-build-isolation

To enable SageSLA, a fast SLA forward pass based on SageAttention, install SpargeAttn first:

pip install git+https://github.com/thu-ml/SpargeAttn.git --no-build-isolation

Inference

For GPUs with more than 40GB of GPU memory, e.g., H100, please use the unquantized checkpoints (without -quant) and remove --quant_linear from the command. For RTX 5090, RTX 4090, or similar GPUs, please use the quantized checkpoints (with -quant) and add --quant_linear in the command.)

  1. Download the VAE (applicable for both Wan2.1 and Wan2.2) and umT5 text encoder checkpoints:

    mkdir checkpoints
    cd checkpoints
    wget https://huggingface.co/Wan-AI/Wan2.1-T2V-1.3B/resolve/main/Wan2.1_VAE.pth
    wget https://huggingface.co/Wan-AI/Wan2.1-T2V-1.3B/resolve/main/models_t5_umt5-xxl-enc-bf16.pth
    
  2. Download our quantized model checkpoints (For RTX 5090 or similar GPUs):

    # For Wan2.1-T2V-1.3B
    wget https://huggingface.co/TurboDiffusion/TurboWan2.1-T2V-1.3B-480P/resolve/main/TurboWan2.1-T2V-1.3B-480P-quant.pth
    
    # For Wan2.2-I2V-14B
    wget https://huggingface.co/TurboDiffusion/TurboWan2.2-I2V-A14B-720P/resolve/main/TurboWan2.2-I2V-A14B-high-720P-quant.pth
    wget https://huggingface.co/TurboDiffusion/TurboWan2.2-I2V-A14B-720P/resolve/main/TurboWan2.2-I2V-A14B-low-720P-quant.pth
    

    Or download our unquantized model checkpoints (For H100 or similar GPUs):

    # For Wan2.1-T2V-1.3B
    wget https://huggingface.co/TurboDiffusion/TurboWan2.1-T2V-1.3B-480P/resolve/main/TurboWan2.1-T2V-1.3B-480P.pth
    
    # For Wan2.2-I2V-14B
    wget https://huggingface.co/TurboDiffusion/TurboWan2.2-I2V-A14B-720P/resolve/main/TurboWan2.2-I2V-A14B-high-720P.pth
    wget https://huggingface.co/TurboDiffusion/TurboWan2.2-I2V-A14B-720P/resolve/main/TurboWan2.2-I2V-A14B-low-720P.pth
    
  3. Use the inference script for the T2V models:

    export PYTHONPATH=turbodiffusion
    
    # Arguments:
    # --dit_path            Path to the finetuned TurboDiffusion checkpoint
    # --model               Model to use: Wan2.1-1.3B or Wan2.1-14B (default: Wan2.1-1.3B)
    # --num_samples         Number of videos to generate (default: 1)
    # --num_steps           Sampling steps, 1–4 (default: 4)
    # --sigma_max           Initial sigma for rCM (default: 80); larger choices (e.g., 1600) reduce diversity but may enhance quality
    # --vae_path            Path to Wan2.1 VAE (default: checkpoints/Wan2.1_VAE.pth)
    # --text_encoder_path   Path to umT5 text encoder (default: checkpoints/models_t5_umt5-xxl-enc-bf16.pth)
    # --num_frames          Number of frames to generate (default: 81)
    # --prompt              Text prompt for video generation
    # --resolution          Output resolution: "480p" or "720p" (default: 480p)
    # --aspect_ratio        Aspect ratio in W:H format (default: 16:9)
    # --seed                Random seed for reproducibility (default: 0)
    # --save_path           Output file path including extension (default: output/generated_video.mp4)
    # --attention_type      Attention module to use: original, sla or sagesla (default: sagesla)
    # --sla_topk            Top-k ratio for SLA/SageSLA attention (default: 0.1), we recommend using 0.15 for better video quality
    # --quant_linear        Enable quantization for linear layers, pass this if using a quantized checkpoint
    # --default_norm        Use the original LayerNorm and RMSNorm of Wan models
    
    python turbodiffusion/inference/wan2.1_t2v_infer.py \
        --model Wan2.1-1.3B \
        --dit_path checkpoints/TurboWan2.1-T2V-1.3B-480P-quant.pth \
        --resolution 480p \
        --prompt "A stylish woman walks down a Tokyo street filled with warm glowing neon and animated city signage. She wears a black leather jacket, a long red dress, and black boots, and carries a black purse. She wears sunglasses and red lipstick. She walks confidently and casually. The street is damp and reflective, creating a mirror effect of the colorful lights. Many pedestrians walk about." \
        --num_samples 1 \
        --num_steps 4 \
        --quant_linear \
        --attention_type sagesla \
        --sla_topk 0.1
    

    Or the script for the I2V model:

    export PYTHONPATH=turbodiffusion
    
    # --image_path              Path to the input image
    # --high_noise_model_path   Path to the high noise TurboDiffusion checkpoint
    # --low_noise_model_path    Path to the high noise TurboDiffusion checkpoint
    # --boundary                Timestep boundary for switching from high to low noise model (default: 0.9)
    # --model                   Model to use: Wan2.2-A14B (default: Wan2.2-A14B)
    # --num_samples             Number of videos to generate (default: 1)
    # --num_steps               Sampling steps, 1–4 (default: 4)
    # --sigma_max               Initial sigma for rCM (default: 200); larger choices (e.g., 1600) reduce diversity but may enhance quality
    # --vae_path                Path to Wan2.2 VAE (default: checkpoints/Wan2.2_VAE.pth)
    # --text_encoder_path       Path to umT5 text encoder (default: checkpoints/models_t5_umt5-xxl-enc-bf16.pth)
    # --num_frames              Number of frames to generate (default: 81)
    # --prompt                  Text prompt for video generation
    # --resolution              Output resolution: "480p" or "720p" (default: 720p)
    # --aspect_ratio            Aspect ratio in W:H format (default: 16:9)
    # --adaptive_resolution     Enable adaptive resolution based on input image size
    # --ode                     Use ODE for sampling (sharper but less robust than SDE)
    # --seed                    Random seed for reproducibility (default: 0)
    # --save_path               Output file path including extension (default: output/generated_video.mp4)
    # --attention_type          Attention module to use: original, sla or sagesla (default: sagesla)
    # --sla_topk                Top-k ratio for SLA/SageSLA attention (default: 0.1), we recommend using 0.15 for better video quality
    # --quant_linear            Enable quantization for linear layers, pass this if using a quantized checkpoint
    # --default_norm            Use the original LayerNorm and RMSNorm of Wan models
    
    python turbodiffusion/inference/wan2.2_i2v_infer.py \
        --model Wan2.2-A14B \
        --low_noise_model_path checkpoints/TurboWan2.2-I2V-A14B-low-720P-quant.pth \
        --high_noise_model_path checkpoints/TurboWan2.2-I2V-A14B-high-720P-quant.pth \
        --resolution 720p \
        --adaptive_resolution \
        --image_path assets/i2v_inputs/i2v_input_0.jpg \
        --prompt "POV selfie video, ultra-messy and ex
    

Related Skills

docs-writer

99.0k

`docs-writer` skill instructions As an expert technical writer and editor for the Gemini CLI project, you produce accurate, clear, and consistent documentation. When asked to write, edit, or revie

model-usage

334.5k

Use CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.

pr

for a github pr, please respond in the following format - ## What type of PR is this? - [ ] 🍕 Feature - [ ] 🐛 Bug Fix - [ ] 📝 Documentation - [ ] 🧑‍💻 Code Refactor - [ ] 🔧 Other ## Description <!-- What changed and why? Optional: include screenshots or other supporting artifacts. --> ## Related Issues <!-- Link issues like: Fixes #123 --> ## Updated requirements or dependencies? - [ ] Requirements or dependencies added/updated/removed - [ ] No requirements changed ## Testing - [ ] Tests added/updated - [ ] No tests needed **How to test or why no tests:** <!-- Describe test steps or explain why tests aren't needed --> ## Checklist - [ ] Self-reviewed the code - [ ] Tests pass locally - [ ] No console errors/warnings ## [optional] What gif best describes this PR?

Design

Campus Second-Hand Trading Platform \- General Design Document (v5.0 \- React Architecture \- Complete Final Version)1\. System Overall Design 1.1. Project Overview This project aims t

View on GitHub
GitHub Stars3.4k
CategoryContent
Updated9m ago
Forks245

Languages

Python

Security Score

100/100

Audited on Mar 25, 2026

No findings