<h2 align="center">MotionDirector: Motion Customization of Text-to-Video Diffusion Models</h2> <a href="https://ruizhaocv.github.io/">Rui Zhao</a> · <a href="https://ycgu.site/">Yuchao Gu</a> · <a href="https://zhangjiewu.github.io/">Jay Zhangjie Wu</a> · <a href="https://junhaozhang98.github.io//">David Junhao Zhang</a> · <a href="https://jia-wei-liu.github.io/">Jia-Wei Liu</a> · <a href="https://weijiawu.github.io/">Weijia Wu</a> · <a href="https://www.jussikeppo.com/">Jussi Keppo</a> · <a href="https://sites.google.com/view/showlab">Mike Zheng Shou</a> <a href="https://arxiv.org/abs/2310.08465"><img src='https://img.shields.io/badge/arXiv-2310.08465-b31b1b.svg'></a> <a href='https://showlab.github.io/MotionDirector'><img src='https://img.shields.io/badge/Project_Page-MotionDirector-blue'></a> <a href='https://huggingface.co/spaces/ruizhaocv/MotionDirector'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-yellow'></a> <a href='https://www.youtube.com/watch?v=Wq93zi8bE3U'><img src='https://img.shields.io/badge/Demo_Video-MotionDirector-red'></a> Show Lab, National University of Singapore <img src="https://github.com/showlab/MotionDirector/blob/page/assets/teaser.gif" width="1080px"/> MotionDirector can customize text-to-video diffusion models to generate videos with desired motions.

Task Definition

Motion Customization of Text-to-Video Diffusion Models: Given a set of video clips of the same motion concept, the task of Motion Customization is to adapt existing text-to-video diffusion models to generate diverse videos with this motion.

Demos

Demo Video:

Customize both Appearance and Motion: <a name="Customize_both_Appearance_and_Motion"></a>

<table class="center"> <tr> <td style="text-align:center;">Reference images or videos</td> <td style="text-align:center;" colspan="3">Videos generated by MotionDirector</td> </tr> <tr> <td><img src=assets/customized_appearance_results/reference_images.png></td> <td><img src=assets/customized_appearance_results/A_Terracotta_Warrior_is_riding_a_horse_through_an_ancient_battlefield_1455028.gif></td> <td><img src=assets/customized_appearance_results/A_Terracotta_Warrior_is_playing_golf_in_front_of_the_Great_Wall_5804477.gif></td> <td><img src=assets/customized_appearance_results/A_Terracotta_Warrior_is_walking_cross_the_ancient_army_captured_with_a_reverse_follow_cinematic_shot_653658.gif></td> </tr> <tr> <td width=25% style="text-align:center;color:gray;">Reference images for appearance customization: "A Terracotta Warrior on a pure color background."</td> <td width=25% style="text-align:center;">"A Terracotta Warrior is riding a horse through an ancient battlefield." seed: 1455028</td> <td width=25% style="text-align:center;">"A Terracotta Warrior is playing golf in front of the Great Wall." seed: 5804477</td> <td width=25% style="text-align:center;">"A Terracotta Warrior is walking cross the ancient army captured with a reverse follow cinematic shot." seed: 653658</td> </tr> <tr> <td><img src=assets/multi_videos_results/reference_videos.gif></td> <td><img src=assets/customized_appearance_results/A_Terracotta_Warrior_is_riding_a_bicycle_past_an_ancient_Chinese_palace_166357.gif></td> <td><img src=assets/customized_appearance_results/A_Terracotta_Warrior_is_lifting_weights_in_front_of_the_Great_Wall_5635982.gif></td> <td><img src=assets/customized_appearance_results/A_Terracotta_Warrior_is_skateboarding_9033688.gif></td> </tr> <tr> <td width=25% style="text-align:center;color:gray;">Reference videos for motion customization: "A person is riding a bicycle."</td> <td width=25% style="text-align:center;">"A Terracotta Warrior is riding a bicycle past an ancient Chinese palace." seed: 166357.</td> <td width=25% style="text-align:center;">"A Terracotta Warrior is lifting weights in front of the Great Wall." seed: 5635982</td> <td width=25% style="text-align:center;">"A Terracotta Warrior is skateboarding." seed: 9033688</td> </tr> </table>

News

[2024.02.03] MotionDirector for AnimateDiff is available. Thanks to ExponentialML.
[2023.12.27] MotionDirector with Customized Appearance released. Now, you can customize both appearance and motion in video generation.
[2023.12.27] MotionDirector for Image Animation released.
[2023.12.23] MotionDirector has been featured in Hugging Face's 'Spaces of the Week 🔥' trending list!
[2023.12.13] Online gradio demo released @ Hugging Face Spaces! Welcome to try it.
[2023.12.06] MotionDirector for Sports released! Lifting weights, riding horse, palying golf, etc.
[2023.12.05] Colab demo is available. Thanks to Camenduru.
[2023.12.04] MotionDirector for Cinematic Shots released. Now, you can make AI films with professional cinematic shots!
[2023.12.02] Code and model weights released!
[2023.10.12] Paper and project page released.

ToDo

[x] Gradio Demo
[ ] More trained weights of MotionDirector

Model List

Setup

Requirements

# create virtual environment
conda create -n motiondirector python=3.8
conda activate motiondirector
# install packages
pip install -r requirements.txt

Weights of Foundation Models

git lfs install
## You can choose the ModelScopeT2V or ZeroScope, etc., as the foundation model.
## ZeroScope
git clone https://huggingface.co/cerspense/zeroscope_v2_576w ./models/zeroscope_v2_576w/
## ModelScopeT2V
git clone https://huggingface.co/damo-vilab/text-to-video-ms-1.7b ./models/model_scope/

Weights of trained MotionDirector <a name="download_weights"></a>

# Make sure you have git-lfs installed (https://git-lfs.com)
git lfs install
git clone https://huggingface.co/ruizhaocv/MotionDirector_weights ./outputs

# More and better trained MotionDirector are released at a new repo:
git clone https://huggingface.co/ruizhaocv/MotionDirector ./outputs
# The usage is slightly different, which will be updated later.

Usage

Training

Train MotionDirector on multiple videos:

python MotionDirector_train.py --config ./configs/config_multi_videos.yaml

Train MotionDirector on a single video:

python MotionDirector_train.py --config ./configs/config_single_video.yaml

Note:

Before running the above command, make sure you replace the path to foundational model weights and training data with your own in the config files config_multi_videos.yaml or config_single_video.yaml.
Generally, training on multiple 16-frame videos usually takes 300~500 steps, about 9~16 minutes using one A5000 GPU. Training on a single video takes 50~150 steps, about 1.5~4.5 minutes using one A5000 GPU. The required VRAM for training is around 14GB.
Reduce n_sample_frames if your GPU memory is limited.
Reduce the learning rate and increase the training steps for better performance.

Inference

python MotionDirector_inference.py --model /path/to/the/foundation/model  --prompt "Your prompt" --checkpoint_folder /path/to/the/trained/MotionDirector --checkpoint_index 300 --noise_prior 0.