Wan2.2
Wan: Open and Advanced Large-Scale Video Generative Models
Install / Use
/learn @Wan-Video/Wan2.2README
Wan2.2
<p align="center"> <img src="assets/logo.png" width="400"/> <p> <p align="center"> 💜 <a href="https://wan.video"><b>Wan</b></a>    |    🖥️ <a href="https://github.com/Wan-Video/Wan2.2">GitHub</a>    |   🤗 <a href="https://huggingface.co/Wan-AI/">Hugging Face</a>   |   🤖 <a href="https://modelscope.cn/organization/Wan-AI">ModelScope</a>   |    📑 <a href="https://arxiv.org/abs/2503.20314">Paper</a>    |    📑 <a href="https://wan.video/welcome?spm=a2ty_o02.30011076.0.0.6c9ee41eCcluqg">Blog</a>    |    💬 <a href="https://discord.gg/AKNgpMK4Yj">Discord</a>   <br> 📕 <a href="https://alidocs.dingtalk.com/i/nodes/jb9Y4gmKWrx9eo4dCql9LlbYJGXn6lpz">使用指南(中文)</a>   |    📘 <a href="https://alidocs.dingtalk.com/i/nodes/EpGBa2Lm8aZxe5myC99MelA2WgN7R35y">User Guide(English)</a>   |   💬 <a href="https://gw.alicdn.com/imgextra/i2/O1CN01tqjWFi1ByuyehkTSB_!!6000000000015-0-tps-611-1279.jpg">WeChat(微信)</a>   <br>Wan: Open and Advanced Large-Scale Video Generative Models <be>
We are excited to introduce Wan2.2, a major upgrade to our foundational video models. With Wan2.2, we have focused on incorporating the following innovations:
-
👍 Effective MoE Architecture: Wan2.2 introduces a Mixture-of-Experts (MoE) architecture into video diffusion models. By separating the denoising process cross timesteps with specialized powerful expert models, this enlarges the overall model capacity while maintaining the same computational cost.
-
👍 Cinematic-level Aesthetics: Wan2.2 incorporates meticulously curated aesthetic data, complete with detailed labels for lighting, composition, contrast, color tone, and more. This allows for more precise and controllable cinematic style generation, facilitating the creation of videos with customizable aesthetic preferences.
-
👍 Complex Motion Generation: Compared to Wan2.1, Wan2.2 is trained on a significantly larger data, with +65.6% more images and +83.2% more videos. This expansion notably enhances the model's generalization across multiple dimensions such as motions, semantics, and aesthetics, achieving TOP performance among all open-sourced and closed-sourced models.
-
👍 Efficient High-Definition Hybrid TI2V: Wan2.2 open-sources a 5B model built with our advanced Wan2.2-VAE that achieves a compression ratio of 16×16×4. This model supports both text-to-video and image-to-video generation at 720P resolution with 24fps and can also run on consumer-grade graphics cards like 4090. It is one of the fastest 720P@24fps models currently available, capable of serving both the industrial and academic sectors simultaneously.
Video Demos
<div align="center"> <video src="https://github.com/user-attachments/assets/b63bfa58-d5d7-4de6-a1a2-98970b06d9a7" width="70%" poster=""> </video> </div>🔥 Latest News!!
-
Nov 13, 2025: 👋 Wan2.2-Animate-14B has been integrated into Diffusers (PR,Weights). Thanks to all community contributors. Enjoy!
-
Sep 19, 2025: 💃 We introduct Wan2.2-Animate-14B, an unified model for character animation and replacement with holistic movement and expression replication. We released the model weights and inference code. And you can try it on wan.video, ModelScope Studio or HuggingFace Space!
-
Aug 26, 2025: 🎵 We introduce Wan2.2-S2V-14B, an audio-driven cinematic video generation model, including inference code, model weights, and technical report! Now you can try it on wan.video, ModelScope Gradio or HuggingFace Gradio!
-
Jul 28, 2025: 👋 We have open a HF space using the TI2V-5B model. Enjoy!
-
Jul 28, 2025: 👋 Wan2.2 has been integrated into ComfyUI (CN | EN). Enjoy!
-
Jul 28, 2025: 👋 Wan2.2's T2V, I2V and TI2V have been integrated into Diffusers (T2V-A14B | I2V-A14B | TI2V-5B). Feel free to give it a try!
-
Jul 28, 2025: 👋 We've released the inference code and model weights of Wan2.2.
-
Sep 5, 2025: 👋 We add text-to-speech synthesis support with CosyVoice for Speech-to-Video generation task.
Community Works
If your research or project builds upon Wan2.1 or Wan2.2, and you would like more people to see it, please inform us.
- Prompt Relay, a plug-and-play, inference-time method for temporal control in video generation. Prompt Relay improves video quality and gives users precise control over what happens at each moment in the video. Visit their webpage for more details.
- Helios, a breakthrough video generation model base on Wan2.1 that achieves minute-scale, high-quality video synthesis at 19.5 FPS on a single H100 GPU (about 10 FPS on a single Ascend NPU) —without relying on conventional long video anti-drifting strategies or standard video acceleration techniques. Visit their webpage for more details.
- LightX2V, a lightweight and efficient video generation framework that integrates Wan2.1 and Wan2.2, supporting multiple engineering acceleration techniques for fast inference. LightX2V-HuggingFace, offers a variety of Wan-based step-distillation models, quantized models, and lightweight VAE models.
- HuMo proposed a unified, human-centric framework based on Wan to produce high-quality, fine-grained, and controllable human videos from multimodal inputs—including text, images, and audio. Visit their webpage for more details.
- FastVideo includes distilled Wan models with sparse attention that significanly speed up the inference time.
- Cache-dit offers Fully Cache Acceleration support for Wan2.2 MoE with DBCache, TaylorSeer and Cache CFG. Visit their example for more details.
- Kijai's ComfyUI WanVideoWrapper is an alternative implementation of Wan models for ComfyUI. Thanks to its Wan-only focus, it's on the frontline of getting cutting edge optimizations and hot research features, which are often hard to integrate into ComfyUI quickly due to its more rigid structure.
- DiffSynth-Studio provides comprehensive support for Wan 2.2, including low-GPU-memory layer-by-layer offload, FP8 quantization, sequence parallelism, LoRA training, full training.
📑 Todo List
- Wan2.2 Text-to-Video
- [x] Multi-GPU Inference code of the A14B and 14B models
- [x] Checkpoints of the A14B and 14B models
- [x] ComfyUI integration
- [x] Diffusers integration
- Wan2.2 Image-to-Video
- [x] Multi-GPU Inference code of the A14B model
- [x] Checkpoints of the A14B model
- [x] ComfyUI integration
- [x] Diffusers integration
- Wan2.2 Text-Image-to-Video
- [x] Multi-GPU Inference code of the 5B model
- [x] Checkpoints of the 5B model
- [x] ComfyUI integration
- [x] Diffusers integration
- Wan2.2-S2V Speech-to-Video
- [x] Inference code of Wan2.2-S2V
- [x] Checkpoints of Wan2.2-S2V-14B
- [x] ComfyUI integration
- [x] Diffusers integration
- Wan2.2-Animate Character Animation and Replacement
- [x] Inference code of Wan2.2-Animate
- [x] Checkpoints of Wan2.2-Animate
- [x] ComfyUI integration
- [x] Diffusers integration
Run Wan2.2
Installation
Clone the repo:
git clone https://github.com/Wan-Video/Wan2.2.git
cd Wan2.2
Install dependencies:
# Ensure torch >= 2.4.0
# If the installation of `flash_attn` fails, try installing the other packages first and install `flash_attn` last
pip install -r requirements.txt
# If you want to use CosyVoice to synthesize speech for Speech-to-Video Generation, please install requirements_s2v.txt additionally
pip install -r requirements_s2v.txt
Model Download
| Models | Download Links | Description | |--------------------|---------------------------------------------------------------------------------------------------------------------------------------------|-------------| | T2V-A14B | 🤗 Huggingface 🤖 ModelScope | Text-to-Video MoE model, supports 480P & 720P | | I2V-A14B | 🤗 [Huggingface](https://huggi
Related Skills
qqbot-channel
343.1kQQ 频道管理技能。查询频道列表、子频道、成员、发帖、公告、日程等操作。使用 qqbot_channel_api 工具代理 QQ 开放平台 HTTP 接口,自动处理 Token 鉴权。当用户需要查看频道、管理子频道、查询成员、发布帖子/公告/日程时使用。
docs-writer
99.7k`docs-writer` skill instructions As an expert technical writer and editor for the Gemini CLI project, you produce accurate, clear, and consistent documentation. When asked to write, edit, or revie
model-usage
343.1kUse CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.
ddd
Guía de Principios DDD para el Proyecto > 📚 Documento Complementario : Este documento define los principios y reglas de DDD. Para ver templates de código, ejemplos detallados y guías paso
