StableAnimator

[CVPR2025] We present StableAnimator, the first end-to-end ID-preserving video diffusion framework, which synthesizes high-quality videos without any post-processing, conditioned on a reference image and a sequence of poses.

Generate Convert Improve

Install / Use

/learn @Francis-Rings/StableAnimator

About this skill

Quality Score

0/100

README

StableAnimator [CVPR2025]

StableAnimator: High-Quality Identity-Preserving Human Image Animation Shuyuan Tu1, Zhen Xing1, Xintong Han3, Zhi-Qi Cheng4, Qi Dai2, Chong Luo2, Zuxuan Wu1 [1Fudan University; 2Microsoft Research Asia; 3Huya Inc; 4Carnegie Mellon University]

<img src="assets/figures/case-47.gif" width="256" /> <img src="assets/figures/case-61.gif" width="256" /> <img src="assets/figures/case-45.gif" width="256" /> <img src="assets/figures/case-46.gif" width="256" /> <img src="assets/figures/case-5.gif" width="256" /> <img src="assets/figures/case-17.gif" width="256" /> Pose-driven Human image animations generated by StableAnimator, showing its power to synthesize high-fidelity and ID-preserving videos. All animations are directly synthesized by StableAnimator without the use of any face-related post-processing tools, such as the face-swapping tool FaceFusion or face restoration models like GFP-GAN and CodeFormer. <img src="assets/figures/case-35.gif" width="384" /> <img src="assets/figures/case-42.gif" width="384" /> <img src="assets/figures/case-18.gif" width="384" /> <img src="assets/figures/case-24.gif" width="384" /> Comparison results between StableAnimator and state-of-the-art (SOTA) human image animation models highlight the superior performance of StableAnimator in delivering high-fidelity, identity-preserving human image animation.

Overview

<img src="assets/figures/framework.jpg" alt="model architecture" width="1280"/> The overview of the framework of StableAnimator.

Current diffusion models for human image animation struggle to ensure identity (ID) consistency. This paper presents StableAnimator, the first end-to-end ID-preserving video diffusion framework, which synthesizes high-quality videos without any post-processing, conditioned on a reference image and a sequence of poses. Building upon a video diffusion model, StableAnimator contains carefully designed modules for both training and inference striving for identity consistency. In particular, StableAnimator begins by computing image and face embeddings with off-the-shelf extractors, respectively and face embeddings are further refined by interacting with image embeddings using a global content-aware Face Encoder. Then, StableAnimator introduces a novel distribution-aware ID Adapter that prevents interference caused by temporal layers while preserving ID via alignment. During inference, we propose a novel Hamilton-Jacobi-Bellman (HJB) equation-based optimization to further enhance the face quality. We demonstrate that solving the HJB equation can be integrated into the diffusion denoising process, and the resulting solution constrains the denoising path and thus benefits ID preservation. Experiments on multiple benchmarks show the effectiveness of StableAnimator both qualitatively and quantitatively.

News

[2025-3-10]:🔥The codes of HJB-based face optimization are released!
[2025-2-27]:🔥 StableAnimator is accepted by CVPR2025🎉🎉🎉. The code of HJB-based face optimization will be released in March. Stay tuned!
[2024-12-13]:🔥 The training code and training tutorial are released! You can train/finetune your own StableAnimator on your own collected datasets! Other codes will be released very soon. Stay tuned!
[2024-12-10]:🔥 The gradio interface is released! Many thanks to @gluttony-10 for his contribution! Other codes will be released very soon. Stay tuned!
[2024-12-6]:🔥 All data preprocessing codes (human skeleton extraction and human face mask extraction) are released! The training code and detailed training tutorial will be released before 2024.12.13. Stay tuned!
[2024-12-4]:🔥 We are thrilled to release an interesting dance demo (🔥🔥APT Dance🔥🔥)! The generated video can be seen on YouTube and Bilibili.
[2024-11-28]:🔥 The data pre-processing codes (human skeleton extraction) are available! Other codes will be released very soon. Stay tuned!
[2024-11-26]:🔥 The project page, code, technical report and a basic model checkpoint are released. Further training codes, data pre-processing codes, the evaluation dataset and StableAnimator-pro will be released very soon. Stay tuned!

To-Do List

[x] StableAnimator-basic
[x] Inference Code
[x] Evaluation Samples
[x] Data Pre-Processing Code (Skeleton Extraction)
[x] Data Pre-Processing Code (Human Face Mask Extraction)
[x] Training Code
[x] Inference Code with HJB-based Face Optimization
[ ] StableAnimator-pro

Quickstart

For the basic version of the model checkpoint, it supports generating videos at a 576x1024 or 512x512 resolution. If you encounter insufficient memory issues, you can appropriately reduce the number of animated frames.

Environment setup

pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu124
pip install torch==2.5.1+cu124 xformers --index-url https://download.pytorch.org/whl/cu124
pip install -r requirements.txt

Download weights

If you encounter connection issues with Hugging Face, you can utilize the mirror endpoint by setting the environment variable: export HF_ENDPOINT=https://hf-mirror.com. Please download weights manually as follows:

cd StableAnimator
git lfs install
git clone https://huggingface.co/FrancisRing/StableAnimator checkpoints

All the weights should be organized in models as follows The overall file structure of this project should be organized as follows:

StableAnimator/
├── DWPose
├── animation
├── checkpoints
│   ├── DWPose
│   │   ├── dw-ll_ucoco_384.onnx
│   │   └── yolox_l.onnx
│   ├── Animation
│   │   ├── pose_net.pth
│   │   ├── face_encoder.pth
│   │   └── unet.pth
│   ├── SVD
│   │   ├── feature_extractor
│   │   ├── image_encoder
│   │   ├── scheduler
│   │   ├── unet
│   │   ├── vae
│   │   ├── model_index.json
│   │   ├── svd_xt.safetensors
│   │   └── svd_xt_image_decoder.safetensors
│   └── inference.zip
├── models
│   │   └── antelopev2
│   │       ├── 1k3d68.onnx
│   │       ├── 2d106det.onnx
│   │       ├── genderage.onnx
│   │       ├── glintr100.onnx
│   │       └── scrfd_10g_bnkps.onnx
├── app.py
├── command_basic_infer.sh
├── inference_basic.py
├── requirement.txt

Notably, there is a bug in the automatic download process of Antelopev2, with the error details described as follows:

Traceback (most recent call last):
  File "/home/StableAnimator/inference_normal.py", line 243, in <module>
    face_model = FaceModel()
  File "/home/StableAnimator/animation/modules/face_model.py", line 11, in __init__
    self.app = FaceAnalysis(
  File "/opt/conda/lib/python3.10/site-packages/insightface/app/face_analysis.py", line 43, in __init__
    assert 'detection' in self.models
AssertionError

This issue is related to the incorrect path of Antelopev2, which is automatically downloaded into the models/antelopev2/antelopev2 directory. The correct path of Antelopev2 should be models/antelopev2. You can run the following commands to tackle this issue:

cd StableAnimator
mv ./models/antelopev2/antelopev2 ./models/tmp
rm -rf ./models/antelopev2
mv ./models/tmp ./models/antelopev2

Evaluation Samples

The evaluation samples presented in the paper can be downloaded from OneDrive or inference.zip in checkpoints. Please download evaluation samples manually as follows:

cd StableAnimator
mkdir inference

All the evaluation samples should be organized as follows:

inference/
├── case-1
│   ├── poses
│   ├── faces
│   └── reference.png
├── case-2
│   ├── poses
│   ├── faces
│   └── reference.png
├── case-3
│   ├── poses
│   ├── faces
│   └── reference.png

Human Skeleton Extraction

We leverage the pre-trained DWPose to extract the human skeletons. In the initialization of DWPose, the pretrained weights should be configured in /DWPose/dwpose_utils/wholebody.py:

onnx_det = 'path/checkpoints/DWPose/yolox_l.onnx'
onnx_pose = 'path/checkpoints/DWPose/dw-ll_ucoco_384.onnx'

Given the target image folder containing multiple .png files, you can use the following command to obtain the corresponding human skeleton images:

python DWPose/skeleton_extraction.py --target_image_folder_path="path/test/target_images" --ref_image_path="path/test/reference.png" --poses_folder_path="path/test/poses"

It is worth noting that the .png files in the target image folder are named in the format frame_i.png, such as frame_0.png, frame_1.png, and so on. --ref_image_path refers to the path of the given reference image. The obtained human skeleton images are saved in path/test/poses. It is particularly significant that the target skeleton images should be aligned with the reference image regardi

Related Skills

docs-writer

99.2k

`docs-writer` skill instructions As an expert technical writer and editor for the Gemini CLI project, you produce accurate, clear, and consistent documentation. When asked to write, edit, or revie

model-usage

337.4k

Use CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.

ddd

Guía de Principios DDD para el Proyecto > 📚 Documento Complementario : Este documento define los principios y reglas de DDD. Para ver templates de código, ejemplos detallados y guías paso

arscontexta

2.9k

Claude Code plugin that generates individualized knowledge systems from conversation. You describe how you think and work, have a conversation and get a complete second brain as markdown files you own.

Francis-Rings

View profile

View on GitHub

GitHub Stars1.4k

CategoryContent

Updated13h ago

Forks98

Francis-Rings/StableAnimator

Languages

Python

Security Score

95/100

Audited on Mar 26, 2026

No findings