CMD
[ICLR'24] Efficient Video Diffusion Models via Content-Frame Motion-Latent Decomposition
Install / Use
/learn @NVlabs/CMDREADME
1. Environment setup
conda create -n cmd python=3.8 -y
conda activate cmd
pip install -r requirements.txt
2. Dataset
Dataset download
Currently, we provide experiments for UCF-101. You can place the data that you want and can specifiy it via --data-path arguments in training scripts.
UCF-101
UCF-101
|-- class1
|-- video1.avi
|-- video2.avi
|-- ...
|-- class2
|-- video1.avi
|-- video2.avi
|-- ...
|-- ...
3. Training
Autoencoder
torchrun --nnodes=[NUM_NODES] --nproc_per_node=[NUM_GPU] train_ae.py \
--dataset-name UCF101 \
--data-path /data/UCF-101 \
--global-batch-size [BATCH_SIZE] \
--results-dir [LOG_DIRECTORY]
--mode pixel \
--ckpt-every 20000
Motion Diffusion Model
python train_motion_diffusion.py \
--nnodes=[NUM_NODES] \
--nproc_per_node=[NUM_GPUS]
--dataset-name UCF101 \
--data-path /data/UCF-101 \
--global-batch-size [BATCH_SIZE] \
--results-dir [LOG_DIRECTORY]
--mode pixel \
--ckpt-every 20000
Content Diffusion Model
python train_content_diffusion.py \
--nnodes=[NUM_NODES] \
--nproc_per_node=[NUM_GPUS]
--dataset-name UCF101 \
--data-path /data/UCF-101 \
--global-batch-size [BATCH_SIZE] \
--results-dir [LOG_DIRECTORY]
--mode pixel \
--ckpt-every 20000 \
--motion-model-config [MOTION_MODEL_CONFIG]
Then these scripts will automatically create the folder in [LOG_DIRECTORY] to save logs and checkpoints.
Note
It's possible that this code may not accurately replicate the results outlined in the paper due to potential human errors during the preparation and cleaning of the code for release. If you encounter any difficulties in reproducing our findings, please don't hesitate to inform us. Additionally, we'll make an effort to carry out sanity-check experiments in the near future.
Citation
Please consider citing CMD if this repository is useful for your work.
@inproceedings{yu2024cmd,
title={Efficient Video Diffusion Models via Content-Frame Motion-Latent Decomposition},
author={Sihyun Yu and Weili Nie and De-An Huang and Boyi Li and Jinwoo Shin and Anima Anandkumar},
booktitle={International Conference on Learning Representations},
year={2024}
}
Licenses
Copyright © 2024, NVIDIA Corporation. All rights reserved.
This work is made available under the NVIDIA Source Code License-NC. Click here to view a copy of this license.
Acknowledgement
This code is mainly built upon PVDM, DiT, and glide-text2im repositories.
We also used the code from following repositories: StyleGAN-V and TATS.
