SkillAgentSearch skills...

MotionDirector

[ECCV 2024 Oral] MotionDirector: Motion Customization of Text-to-Video Diffusion Models.

Install / Use

/learn @showlab/MotionDirector

README

<p align="center"> <h2 align="center">MotionDirector: Motion Customization of Text-to-Video Diffusion Models</h2> <p align="center"> <a href="https://ruizhaocv.github.io/"><strong>Rui Zhao</strong></a> · <a href="https://ycgu.site/"><strong>Yuchao Gu</strong></a> · <a href="https://zhangjiewu.github.io/"><strong>Jay Zhangjie Wu</strong></a> · <a href="https://junhaozhang98.github.io//"><strong>David Junhao Zhang</strong></a> · <a href="https://jia-wei-liu.github.io/"><strong>Jia-Wei Liu</strong></a> · <a href="https://weijiawu.github.io/"><strong>Weijia Wu</strong></a> · <a href="https://www.jussikeppo.com/"><strong>Jussi Keppo</strong></a> · <a href="https://sites.google.com/view/showlab"><strong>Mike Zheng Shou</strong></a> <br> <br> <a href="https://arxiv.org/abs/2310.08465"><img src='https://img.shields.io/badge/arXiv-2310.08465-b31b1b.svg'></a> <a href='https://showlab.github.io/MotionDirector'><img src='https://img.shields.io/badge/Project_Page-MotionDirector-blue'></a> <a href='https://huggingface.co/spaces/ruizhaocv/MotionDirector'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-yellow'></a> <a href='https://www.youtube.com/watch?v=Wq93zi8bE3U'><img src='https://img.shields.io/badge/Demo_Video-MotionDirector-red'></a> <br> <b>Show Lab, National University of Singapore</b> </p> <p align="center"> <img src="https://github.com/showlab/MotionDirector/blob/page/assets/teaser.gif" width="1080px"/> <br> <em>MotionDirector can customize text-to-video diffusion models to generate videos with desired motions.</em> </p>

Task Definition

Motion Customization of Text-to-Video Diffusion Models: </br> Given a set of video clips of the same motion concept, the task of Motion Customization is to adapt existing text-to-video diffusion models to generate diverse videos with this motion.

Demos

Demo Video:

Demo Video of MotionDirector

Customize both Appearance and Motion: <a name="Customize_both_Appearance_and_Motion"></a>

<table class="center"> <tr> <td style="text-align:center;"><b>Reference images or videos</b></td> <td style="text-align:center;" colspan="3"><b>Videos generated by MotionDirector</b></td> </tr> <tr> <td><img src=assets/customized_appearance_results/reference_images.png></td> <td><img src=assets/customized_appearance_results/A_Terracotta_Warrior_is_riding_a_horse_through_an_ancient_battlefield_1455028.gif></td> <td><img src=assets/customized_appearance_results/A_Terracotta_Warrior_is_playing_golf_in_front_of_the_Great_Wall_5804477.gif></td> <td><img src=assets/customized_appearance_results/A_Terracotta_Warrior_is_walking_cross_the_ancient_army_captured_with_a_reverse_follow_cinematic_shot_653658.gif></td> </tr> <tr> <td width=25% style="text-align:center;color:gray;">Reference images for appearance customization: "A Terracotta Warrior on a pure color background."</td> <td width=25% style="text-align:center;">"A Terracotta Warrior is riding a horse through an ancient battlefield."</br> seed: 1455028</td> <td width=25% style="text-align:center;">"A Terracotta Warrior is playing golf in front of the Great Wall." </br> seed: 5804477</td> <td width=25% style="text-align:center;">"A Terracotta Warrior is walking cross the ancient army captured with a reverse follow cinematic shot." </br> seed: 653658</td> </tr> <tr> <td><img src=assets/multi_videos_results/reference_videos.gif></td> <td><img src=assets/customized_appearance_results/A_Terracotta_Warrior_is_riding_a_bicycle_past_an_ancient_Chinese_palace_166357.gif></td> <td><img src=assets/customized_appearance_results/A_Terracotta_Warrior_is_lifting_weights_in_front_of_the_Great_Wall_5635982.gif></td> <td><img src=assets/customized_appearance_results/A_Terracotta_Warrior_is_skateboarding_9033688.gif></td> </tr> <tr> <td width=25% style="text-align:center;color:gray;">Reference videos for motion customization: "A person is riding a bicycle."</td> <td width=25% style="text-align:center;">"A Terracotta Warrior is riding a bicycle past an ancient Chinese palace."</br> seed: 166357.</td> <td width=25% style="text-align:center;">"A Terracotta Warrior is lifting weights in front of the Great Wall." </br> seed: 5635982</td> <td width=25% style="text-align:center;">"A Terracotta Warrior is skateboarding." </br> seed: 9033688</td> </tr> </table>

News

ToDo

  • [x] Gradio Demo
  • [ ] More trained weights of MotionDirector

Model List

| Type | Training Data | Descriptions | Link | | :---: |:-------------------------------------------------------------------------------------------------------------------------:|:---------------------------------------------------------------------------------------:|:---:| | MotionDirector for Sports | Multiple videos for each model. | Learn motion concepts of sports, i.e. lifting weights, riding horse, palying golf, etc. | Link | | MotionDirector for Cinematic Shots | A single video for each model. | Learn motion concepts of cinematic shots, i.e. dolly zoom, zoom in, zoom out, etc. | Link | | MotionDirector for Image Animation | A single image for spatial path. And a single video or multiple videos for temporal path. | Animate the given image with learned motions. | Link | | MotionDirector with Customized Appearance | A single image or multiple images for spatial path. And a single video or multiple videos for temporal path. | Customize both appearance and motion in video generation. | Link |

Setup

Requirements

# create virtual environment
conda create -n motiondirector python=3.8
conda activate motiondirector
# install packages
pip install -r requirements.txt

Weights of Foundation Models

git lfs install
## You can choose the ModelScopeT2V or ZeroScope, etc., as the foundation model.
## ZeroScope
git clone https://huggingface.co/cerspense/zeroscope_v2_576w ./models/zeroscope_v2_576w/
## ModelScopeT2V
git clone https://huggingface.co/damo-vilab/text-to-video-ms-1.7b ./models/model_scope/

Weights of trained MotionDirector <a name="download_weights"></a>

# Make sure you have git-lfs installed (https://git-lfs.com)
git lfs install
git clone https://huggingface.co/ruizhaocv/MotionDirector_weights ./outputs

# More and better trained MotionDirector are released at a new repo:
git clone https://huggingface.co/ruizhaocv/MotionDirector ./outputs
# The usage is slightly different, which will be updated later.

Usage

Training

Train MotionDirector on multiple videos:

python MotionDirector_train.py --config ./configs/config_multi_videos.yaml

Train MotionDirector on a single video:

python MotionDirector_train.py --config ./configs/config_single_video.yaml

Note:

  • Before running the above command, make sure you replace the path to foundational model weights and training data with your own in the config files config_multi_videos.yaml or config_single_video.yaml.
  • Generally, training on multiple 16-frame videos usually takes 300~500 steps, about 9~16 minutes using one A5000 GPU. Training on a single video takes 50~150 steps, about 1.5~4.5 minutes using one A5000 GPU. The required VRAM for training is around 14GB.
  • Reduce n_sample_frames if your GPU memory is limited.
  • Reduce the learning rate and increase the training steps for better performance.

Inference

python MotionDirector_inference.py --model /path/to/the/foundation/model  --prompt "Your prompt" --checkpoint_folder /path/to/the/trained/MotionDirector --checkpoint_index 300 --noise_prior 0.

Note:

  • Replace /path/to/the/foundation/model with your own path to the foundation model, like ZeroScope.
  • The value of `checkpoint_ind
View on GitHub
GitHub Stars1.1k
CategoryContent
Updated3d ago
Forks60

Languages

Python

Security Score

100/100

Audited on Mar 17, 2026

No findings