Champ

[ECCV 2024] Champ: Controllable and Consistent Human Image Animation with 3D Parametric Guidance

Generate Convert Improve

Install / Use

/learn @fudan-generative-vision/Champ

About this skill

Quality Score

0/100

README

<h1 align='Center'>Champ: Controllable and Consistent Human Image Animation with 3D Parametric Guidance</h1> <div align='Center'> <a href='https://github.com/ShenhaoZhu' target='_blank'>Shenhao Zhu</a><sup>*1</sup>&emsp; <a href='https://github.com/Leoooo333' target='_blank'>Junming Leo Chen</a><sup>*2</sup>&emsp; <a href='https://github.com/daizuozhuo' target='_blank'>Zuozhuo Dai</a><sup>3</sup>&emsp; <a href='https://ai3.fudan.edu.cn/info/1088/1266.htm' target='_blank'>Yinghui Xu</a><sup>2</sup>&emsp; <a href='https://cite.nju.edu.cn/People/Faculty/20190621/i5054.html' target='_blank'>Xun Cao</a><sup>1</sup>&emsp; <a href='https://yoyo000.github.io/' target='_blank'>Yao Yao</a><sup>1</sup>&emsp; <a href='http://zhuhao.cc/home/' target='_blank'>Hao Zhu</a><sup>+1</sup>&emsp; <a href='https://sites.google.com/site/zhusiyucs/home' target='_blank'>Siyu Zhu</a><sup>+2</sup> </div> <div align='Center'> <sup>1</sup>Nanjing University <sup>2</sup>Fudan University <sup>3</sup>Alibaba Group </div> <div align='Center'> <i><strong><a href='https://eccv2024.ecva.net' target='_blank'>ECCV 2024</a></strong></i> </div> <div align='Center'> <a href='https://fudan-generative-vision.github.io/champ/#/'><img src='https://img.shields.io/badge/Project-Page-Green'></a> <a href='https://arxiv.org/abs/2403.14781'><img src='https://img.shields.io/badge/Paper-Arxiv-red'></a> <a href='https://youtu.be/2XVsy9tQRAY'><img src='https://badges.aleen42.com/src/youtube.svg'></a> <a href='assets/wechat.jpeg'><img src='https://badges.aleen42.com/src/wechat.svg'></a> </div>

https://github.com/fudan-generative-vision/champ/assets/82803297/b4571be6-dfb0-4926-8440-3db229ebd4aa

Framework

framework

News

2024/05/05: 🎉🎉🎉Sample training data on HuggingFace released.
2024/05/02: 🌟🌟🌟Training source code released #99.
2024/04/28: 👏👏👏Smooth SMPLs in Blender method released #96.
2024/04/26: 🚁Great Blender Adds-on CEB Studios for various SMPL process!
2024/04/12: ✨✨✨SMPL & Rendering scripts released! Champ your dance videos now💃🤸‍♂️🕺. See docs.
2024/03/30: 🚀🚀🚀Amazing ComfyUI Wrapper by community. Here is the video tutorial. Thanks to @kijai🥳
2024/03/27: Cool Demo on replicate🌟. Thanks to @camenduru👏
2024/03/27: Visit our roadmap🕒 to preview the future of Champ.

Installation

System requirement: Ubuntu20.04/Windows 11, Cuda 12.1
Tested GPUs: A100, RTX3090

Create conda environment:

  conda create -n champ python=3.10
  conda activate champ

Install packages with pip

  pip install -r requirements.txt

Install packages with poetry

If you want to run this project on a Windows device, we strongly recommend to use poetry.

poetry install --no-root

Inference

The inference entrypoint script is ${PROJECT_ROOT}/inference.py. Before testing your cases, there are two preparations need to be completed:

Download all required pretrained models.
Prepare your guidance motions.
Run inference.

Download pretrained models

You can easily get all pretrained models required by inference from our HuggingFace repo.

Clone the the pretrained models into ${PROJECT_ROOT}/pretrained_models directory by cmd below:

git lfs install
git clone https://huggingface.co/fudan-generative-ai/champ pretrained_models

Or you can download them separately from their source repo:

Champ ckpts: Consist of denoising UNet, guidance encoders, Reference UNet, and motion module.
StableDiffusion V1.5: Initialized and fine-tuned from Stable-Diffusion-v1-2. (Thanks to runwayml)
sd-vae-ft-mse: Weights are intended to be used with the diffusers library. (Thanks to stablilityai)
image_encoder: Fine-tuned from CompVis/stable-diffusion-v1-4-original to accept CLIP image embedding rather than text embeddings. (Thanks to lambdalabs)

Finally, these pretrained models should be organized as follows:

./pretrained_models/
|-- champ
|   |-- denoising_unet.pth
|   |-- guidance_encoder_depth.pth
|   |-- guidance_encoder_dwpose.pth
|   |-- guidance_encoder_normal.pth
|   |-- guidance_encoder_semantic_map.pth
|   |-- reference_unet.pth
|   `-- motion_module.pth
|-- image_encoder
|   |-- config.json
|   `-- pytorch_model.bin
|-- sd-vae-ft-mse
|   |-- config.json
|   |-- diffusion_pytorch_model.bin
|   `-- diffusion_pytorch_model.safetensors
`-- stable-diffusion-v1-5
    |-- feature_extractor
    |   `-- preprocessor_config.json
    |-- model_index.json
    |-- unet
    |   |-- config.json
    |   `-- diffusion_pytorch_model.bin
    `-- v1-inference.yaml

Prepare your guidance motions

Guidance motion data which is produced via SMPL & Rendering is necessary when performing inference.

You can download our pre-rendered samples on our HuggingFace repo and place into ${PROJECT_ROOT}/example_data directory:

git lfs install
git clone https://huggingface.co/datasets/fudan-generative-ai/champ_motions_example example_data

Or you can follow the SMPL & Rendering doc to produce your own motion datas.

Finally, the ${PROJECT_ROOT}/example_data will be like this:

./example_data/
|-- motions/  # Directory includes motions per subfolder
|   |-- motion-01/  # A motion sample
|   |   |-- depth/  # Depth frame sequance
|   |   |-- dwpose/ # Dwpose frame sequance
|   |   |-- mask/   # Mask frame sequance
|   |   |-- normal/ # Normal map frame sequance
|   |   `-- semantic_map/ # Semanic map frame sequance
|   |-- motion-02/
|   |   |-- ...
|   |   `-- ...
|   `-- motion-N/
|       |-- ...
|       `-- ...
`-- ref_images/ # Reference image samples(Optional)
    |-- ref-01.png
    |-- ...
    `-- ref-N.png

Run inference

Now we have all prepared models and motions in ${PROJECT_ROOT}/pretrained_models and ${PROJECT_ROOT}/example_data separately.

Here is the command for inference:

  python inference.py --config configs/inference/inference.yaml

If using poetry, command is

poetry run python inference.py --config configs/inference/inference.yaml

Animation results will be saved in ${PROJECT_ROOT}/results folder. You can change the reference image or the guidance motion by modifying inference.yaml.

The default motion-02 in inference.yaml has about 250 frames, requires ~20GB VRAM.

Note: If your VRAM is insufficient, you can switch to a shorter motion sequence or cut out a segment from a long sequence. We provide a frame range selector in inference.yaml, which you can replace with a list of [min_frame_index, max_frame_index] to conveniently cut out a segment from the sequence.

Train the Model

The training process consists of two distinct stages. For more information, refer to the Training Section in the paper on arXiv.

Prepare Datasets

Prepare your own training videos with human motion (or use our sample training data on HuggingFace) and modify data.video_folder value in training config yaml.

All training videos need to be processed into SMPL & DWPose format. Refer to the Data Process doc.

The directory structure will be like this:

/training_data/
|-- video01/          # A video data frame
|   |-- depth/        # Depth frame sequance
|   |-- dwpose/       # Dwpose frame sequance
|   |-- mask/         # Mask frame sequance
|   |-- normal/       # Normal map frame sequance
|   `-- semantic_map/ # Semanic map frame sequance
|-- video02/
|   |-- ...
|   `-- ...
`-- videoN/
|-- ...
`-- ...

Select another small batch of data as the validation set, and modify the validation.ref_images and validation.guidance_folders roots in training config yaml.

Run Training Scripts

To train the Champ model, use the following command:

# Run training script of stage1
accelerate launch train_s1.py --config configs/train/stage1.yaml

# Modify the `stage1_ckpt_dir` value in yaml and run training script of stage2
accelerate launch train_s2.py --config configs/train/stage2.yaml

Datasets

Roadmap

| Status | Milestone

Related Skills

docs-writer

98.9k

`docs-writer` skill instructions As an expert technical writer and editor for the Gemini CLI project, you produce accurate, clear, and consistent documentation. When asked to write, edit, or revie

model-usage

333.3k

Use CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.

Design

Campus Second-Hand Trading Platform \- General Design Document (v5.0 \- React Architecture \- Complete Final Version)1\. System Overall Design 1.1. Project Overview This project aims t

arscontexta

2.8k

Claude Code plugin that generates individualized knowledge systems from conversation. You describe how you think and work, have a conversation and get a complete second brain as markdown files you own.

fudan-generative-vision

View profile

View on GitHub

GitHub Stars4.2k

CategoryContent

Updated1h ago

Forks482

fudan-generative-vision/champ

Languages

Python

Security Score

100/100

Audited on Mar 24, 2026

No findings