Mojito: Motion Trajectory and Intensity Control for Video Generation

Xuehai He, Shuohang Wang, Jianwei Yang, Xiaoxia Wu, Yiping Wang, Kuan Wang, Zheng Zhan, Olatunji Ruwase, Yelong Shen, Xin Eric Wang

Teaser figure

TODO

[ ] Gradio Demo
[ ] Huggingface setup
[x] Release training-free DMC module
[ ] Release training code
[x] Release inference code

Directional Control

Environmental Setup

bash environment.bash

Inference

Download the checkpoint from Google Drive and put it under the root path of this folder.

Write your text prompt in prompts.txt.

Run the inference by:

bash configs/inference/run_text2video.sh

To achieve the directional/trajectory guidance, you need to specify the object name and it's spatial locations by (the object name should be in your text prompt):

--object_phrase # The object name to control the moving direction/trajectory in the generated video.

--bboxes # The location of input bounding boxes

You can also specify the number of guidance steps by:

--guidance_step # The number of steps to performance guidance, the higher the stronger guidance.

and guidance scale directly by:

--loss_scale # The higher scale, the stronger guidance.

News

2025.6.10 Release the inference code of DMC.
2025.2.10 🚀 Release the project website!

Citation

@article{he2024mojito,
  title   = {Mojito: Motion Trajectory and Intensity Control for Video Generation},
  author  = {Xuehai He and Shuohang Wang and Jianwei Yang and Xiaoxia Wu and Yiping Wang and Kuan Wang and Zheng Zhan and Olatunji Ruwase and Yelong Shen and Xin Eric Wang},
  year    = {2024},
  journal = {arXiv preprint arXiv: 2412.08948}
}

Mojito

Install / Use

README