SkillAgentSearch skills...

Wan2.2

Wan: Open and Advanced Large-Scale Video Generative Models

Install / Use

/learn @Wan-Video/Wan2.2
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

Wan2.2

<p align="center"> <img src="assets/logo.png" width="400"/> <p> <p align="center"> 💜 <a href="https://wan.video"><b>Wan</b></a> &nbsp&nbsp | &nbsp&nbsp 🖥️ <a href="https://github.com/Wan-Video/Wan2.2">GitHub</a> &nbsp&nbsp | &nbsp&nbsp🤗 <a href="https://huggingface.co/Wan-AI/">Hugging Face</a>&nbsp&nbsp | &nbsp&nbsp🤖 <a href="https://modelscope.cn/organization/Wan-AI">ModelScope</a>&nbsp&nbsp | &nbsp&nbsp 📑 <a href="https://arxiv.org/abs/2503.20314">Paper</a> &nbsp&nbsp | &nbsp&nbsp 📑 <a href="https://wan.video/welcome?spm=a2ty_o02.30011076.0.0.6c9ee41eCcluqg">Blog</a> &nbsp&nbsp | &nbsp&nbsp 💬 <a href="https://discord.gg/AKNgpMK4Yj">Discord</a>&nbsp&nbsp <br> 📕 <a href="https://alidocs.dingtalk.com/i/nodes/jb9Y4gmKWrx9eo4dCql9LlbYJGXn6lpz">使用指南(中文)</a>&nbsp&nbsp | &nbsp&nbsp 📘 <a href="https://alidocs.dingtalk.com/i/nodes/EpGBa2Lm8aZxe5myC99MelA2WgN7R35y">User Guide(English)</a>&nbsp&nbsp | &nbsp&nbsp💬 <a href="https://gw.alicdn.com/imgextra/i2/O1CN01tqjWFi1ByuyehkTSB_!!6000000000015-0-tps-611-1279.jpg">WeChat(微信)</a>&nbsp&nbsp <br>

Wan: Open and Advanced Large-Scale Video Generative Models <be>

We are excited to introduce Wan2.2, a major upgrade to our foundational video models. With Wan2.2, we have focused on incorporating the following innovations:

  • 👍 Effective MoE Architecture: Wan2.2 introduces a Mixture-of-Experts (MoE) architecture into video diffusion models. By separating the denoising process cross timesteps with specialized powerful expert models, this enlarges the overall model capacity while maintaining the same computational cost.

  • 👍 Cinematic-level Aesthetics: Wan2.2 incorporates meticulously curated aesthetic data, complete with detailed labels for lighting, composition, contrast, color tone, and more. This allows for more precise and controllable cinematic style generation, facilitating the creation of videos with customizable aesthetic preferences.

  • 👍 Complex Motion Generation: Compared to Wan2.1, Wan2.2 is trained on a significantly larger data, with +65.6% more images and +83.2% more videos. This expansion notably enhances the model's generalization across multiple dimensions such as motions, semantics, and aesthetics, achieving TOP performance among all open-sourced and closed-sourced models.

  • 👍 Efficient High-Definition Hybrid TI2V: Wan2.2 open-sources a 5B model built with our advanced Wan2.2-VAE that achieves a compression ratio of 16×16×4. This model supports both text-to-video and image-to-video generation at 720P resolution with 24fps and can also run on consumer-grade graphics cards like 4090. It is one of the fastest 720P@24fps models currently available, capable of serving both the industrial and academic sectors simultaneously.

Video Demos

<div align="center"> <video src="https://github.com/user-attachments/assets/b63bfa58-d5d7-4de6-a1a2-98970b06d9a7" width="70%" poster=""> </video> </div>

🔥 Latest News!!

Community Works

If your research or project builds upon Wan2.1 or Wan2.2, and you would like more people to see it, please inform us.

  • Prompt Relay, a plug-and-play, inference-time method for temporal control in video generation. Prompt Relay improves video quality and gives users precise control over what happens at each moment in the video. Visit their webpage for more details.
  • Helios, a breakthrough video generation model base on Wan2.1 that achieves minute-scale, high-quality video synthesis at 19.5 FPS on a single H100 GPU (about 10 FPS on a single Ascend NPU) —without relying on conventional long video anti-drifting strategies or standard video acceleration techniques. Visit their webpage for more details.
  • LightX2V, a lightweight and efficient video generation framework that integrates Wan2.1 and Wan2.2, supporting multiple engineering acceleration techniques for fast inference. LightX2V-HuggingFace, offers a variety of Wan-based step-distillation models, quantized models, and lightweight VAE models.
  • HuMo proposed a unified, human-centric framework based on Wan to produce high-quality, fine-grained, and controllable human videos from multimodal inputs—including text, images, and audio. Visit their webpage for more details.
  • FastVideo includes distilled Wan models with sparse attention that significanly speed up the inference time.
  • Cache-dit offers Fully Cache Acceleration support for Wan2.2 MoE with DBCache, TaylorSeer and Cache CFG. Visit their example for more details.
  • Kijai's ComfyUI WanVideoWrapper is an alternative implementation of Wan models for ComfyUI. Thanks to its Wan-only focus, it's on the frontline of getting cutting edge optimizations and hot research features, which are often hard to integrate into ComfyUI quickly due to its more rigid structure.
  • DiffSynth-Studio provides comprehensive support for Wan 2.2, including low-GPU-memory layer-by-layer offload, FP8 quantization, sequence parallelism, LoRA training, full training.

📑 Todo List

  • Wan2.2 Text-to-Video
    • [x] Multi-GPU Inference code of the A14B and 14B models
    • [x] Checkpoints of the A14B and 14B models
    • [x] ComfyUI integration
    • [x] Diffusers integration
  • Wan2.2 Image-to-Video
    • [x] Multi-GPU Inference code of the A14B model
    • [x] Checkpoints of the A14B model
    • [x] ComfyUI integration
    • [x] Diffusers integration
  • Wan2.2 Text-Image-to-Video
    • [x] Multi-GPU Inference code of the 5B model
    • [x] Checkpoints of the 5B model
    • [x] ComfyUI integration
    • [x] Diffusers integration
  • Wan2.2-S2V Speech-to-Video
    • [x] Inference code of Wan2.2-S2V
    • [x] Checkpoints of Wan2.2-S2V-14B
    • [x] ComfyUI integration
    • [x] Diffusers integration
  • Wan2.2-Animate Character Animation and Replacement
    • [x] Inference code of Wan2.2-Animate
    • [x] Checkpoints of Wan2.2-Animate
    • [x] ComfyUI integration
    • [x] Diffusers integration

Run Wan2.2

Installation

Clone the repo:

git clone https://github.com/Wan-Video/Wan2.2.git
cd Wan2.2

Install dependencies:

# Ensure torch >= 2.4.0
# If the installation of `flash_attn` fails, try installing the other packages first and install `flash_attn` last
pip install -r requirements.txt
# If you want to use CosyVoice to synthesize speech for Speech-to-Video Generation, please install requirements_s2v.txt additionally
pip install -r requirements_s2v.txt

Model Download

| Models | Download Links | Description | |--------------------|---------------------------------------------------------------------------------------------------------------------------------------------|-------------| | T2V-A14B | 🤗 Huggingface 🤖 ModelScope | Text-to-Video MoE model, supports 480P & 720P | | I2V-A14B | 🤗 [Huggingface](https://huggi

Related Skills

View on GitHub
GitHub Stars15.0k
CategoryContent
Updated3h ago
Forks1.8k

Languages

Python

Security Score

100/100

Audited on Mar 31, 2026

No findings