MotionDirector
[ECCV 2024 Oral] MotionDirector: Motion Customization of Text-to-Video Diffusion Models.
Install / Use
/learn @showlab/MotionDirectorREADME
Task Definition
Motion Customization of Text-to-Video Diffusion Models: </br> Given a set of video clips of the same motion concept, the task of Motion Customization is to adapt existing text-to-video diffusion models to generate diverse videos with this motion.
Demos
Demo Video:
Customize both Appearance and Motion: <a name="Customize_both_Appearance_and_Motion"></a>
<table class="center"> <tr> <td style="text-align:center;"><b>Reference images or videos</b></td> <td style="text-align:center;" colspan="3"><b>Videos generated by MotionDirector</b></td> </tr> <tr> <td><img src=assets/customized_appearance_results/reference_images.png></td> <td><img src=assets/customized_appearance_results/A_Terracotta_Warrior_is_riding_a_horse_through_an_ancient_battlefield_1455028.gif></td> <td><img src=assets/customized_appearance_results/A_Terracotta_Warrior_is_playing_golf_in_front_of_the_Great_Wall_5804477.gif></td> <td><img src=assets/customized_appearance_results/A_Terracotta_Warrior_is_walking_cross_the_ancient_army_captured_with_a_reverse_follow_cinematic_shot_653658.gif></td> </tr> <tr> <td width=25% style="text-align:center;color:gray;">Reference images for appearance customization: "A Terracotta Warrior on a pure color background."</td> <td width=25% style="text-align:center;">"A Terracotta Warrior is riding a horse through an ancient battlefield."</br> seed: 1455028</td> <td width=25% style="text-align:center;">"A Terracotta Warrior is playing golf in front of the Great Wall." </br> seed: 5804477</td> <td width=25% style="text-align:center;">"A Terracotta Warrior is walking cross the ancient army captured with a reverse follow cinematic shot." </br> seed: 653658</td> </tr> <tr> <td><img src=assets/multi_videos_results/reference_videos.gif></td> <td><img src=assets/customized_appearance_results/A_Terracotta_Warrior_is_riding_a_bicycle_past_an_ancient_Chinese_palace_166357.gif></td> <td><img src=assets/customized_appearance_results/A_Terracotta_Warrior_is_lifting_weights_in_front_of_the_Great_Wall_5635982.gif></td> <td><img src=assets/customized_appearance_results/A_Terracotta_Warrior_is_skateboarding_9033688.gif></td> </tr> <tr> <td width=25% style="text-align:center;color:gray;">Reference videos for motion customization: "A person is riding a bicycle."</td> <td width=25% style="text-align:center;">"A Terracotta Warrior is riding a bicycle past an ancient Chinese palace."</br> seed: 166357.</td> <td width=25% style="text-align:center;">"A Terracotta Warrior is lifting weights in front of the Great Wall." </br> seed: 5635982</td> <td width=25% style="text-align:center;">"A Terracotta Warrior is skateboarding." </br> seed: 9033688</td> </tr> </table>News
- [2024.02.03] MotionDirector for AnimateDiff is available. Thanks to ExponentialML.
- [2023.12.27] MotionDirector with Customized Appearance released. Now, you can customize both appearance and motion in video generation.
- [2023.12.27] MotionDirector for Image Animation released.
- [2023.12.23] MotionDirector has been featured in Hugging Face's 'Spaces of the Week 🔥' trending list!
- [2023.12.13] Online gradio demo released @ Hugging Face Spaces! Welcome to try it.
- [2023.12.06] MotionDirector for Sports released! Lifting weights, riding horse, palying golf, etc.
- [2023.12.05] Colab demo is available. Thanks to Camenduru.
- [2023.12.04] MotionDirector for Cinematic Shots released. Now, you can make AI films with professional cinematic shots!
- [2023.12.02] Code and model weights released!
- [2023.10.12] Paper and project page released.
ToDo
- [x] Gradio Demo
- [ ] More trained weights of MotionDirector
Model List
| Type | Training Data | Descriptions | Link | | :---: |:-------------------------------------------------------------------------------------------------------------------------:|:---------------------------------------------------------------------------------------:|:---:| | MotionDirector for Sports | Multiple videos for each model. | Learn motion concepts of sports, i.e. lifting weights, riding horse, palying golf, etc. | Link | | MotionDirector for Cinematic Shots | A single video for each model. | Learn motion concepts of cinematic shots, i.e. dolly zoom, zoom in, zoom out, etc. | Link | | MotionDirector for Image Animation | A single image for spatial path. And a single video or multiple videos for temporal path. | Animate the given image with learned motions. | Link | | MotionDirector with Customized Appearance | A single image or multiple images for spatial path. And a single video or multiple videos for temporal path. | Customize both appearance and motion in video generation. | Link |
Setup
Requirements
# create virtual environment
conda create -n motiondirector python=3.8
conda activate motiondirector
# install packages
pip install -r requirements.txt
Weights of Foundation Models
git lfs install
## You can choose the ModelScopeT2V or ZeroScope, etc., as the foundation model.
## ZeroScope
git clone https://huggingface.co/cerspense/zeroscope_v2_576w ./models/zeroscope_v2_576w/
## ModelScopeT2V
git clone https://huggingface.co/damo-vilab/text-to-video-ms-1.7b ./models/model_scope/
Weights of trained MotionDirector <a name="download_weights"></a>
# Make sure you have git-lfs installed (https://git-lfs.com)
git lfs install
git clone https://huggingface.co/ruizhaocv/MotionDirector_weights ./outputs
# More and better trained MotionDirector are released at a new repo:
git clone https://huggingface.co/ruizhaocv/MotionDirector ./outputs
# The usage is slightly different, which will be updated later.
Usage
Training
Train MotionDirector on multiple videos:
python MotionDirector_train.py --config ./configs/config_multi_videos.yaml
Train MotionDirector on a single video:
python MotionDirector_train.py --config ./configs/config_single_video.yaml
Note:
- Before running the above command,
make sure you replace the path to foundational model weights and training data with your own in the config files
config_multi_videos.yamlorconfig_single_video.yaml. - Generally, training on multiple 16-frame videos usually takes
300~500steps, about9~16minutes using one A5000 GPU. Training on a single video takes50~150steps, about1.5~4.5minutes using one A5000 GPU. The required VRAM for training is around14GB. - Reduce
n_sample_framesif your GPU memory is limited. - Reduce the learning rate and increase the training steps for better performance.
Inference
python MotionDirector_inference.py --model /path/to/the/foundation/model --prompt "Your prompt" --checkpoint_folder /path/to/the/trained/MotionDirector --checkpoint_index 300 --noise_prior 0.
Note:
- Replace
/path/to/the/foundation/modelwith your own path to the foundation model, like ZeroScope. - The value of `checkpoint_ind

