ReCamMaster
[ICCV'25 Best Paper Finalist] ReCamMaster: Camera-Controlled Generative Rendering from A Single Video
Install / Use
/learn @KlingAIResearch/ReCamMasterREADME
ReCamMaster: Camera-Controlled Generative Rendering from A Single Video (ICCV'25 Oral, Best Paper Finalist)
<div align="center"> <div align="center" style="margin-top: 0px; margin-bottom: 0px;"> <img src=https://github.com/user-attachments/assets/81ccf80e-f4b6-4a3d-b47a-e9c2ce14e34f width="30%"/> </div>[<a href="https://arxiv.org/abs/2503.11647" target="_blank">arXiv</a>] [<a href="https://jianhongbai.github.io/ReCamMaster/" target="_blank">Project Page</a>] [<a href="https://huggingface.co/datasets/KwaiVGI/MultiCamVideo-Dataset" target="_blank">Dataset</a>]
Jianhong Bai<sup>1*</sup>, Menghan Xia<sup>2†</sup>, Xiao Fu<sup>3</sup>, Xintao Wang<sup>2</sup>, Lianrui Mu<sup>1</sup>, Jinwen Cao<sup>2</sup>, <br>Zuozhu Liu<sup>1</sup>, Haoji Hu<sup>1†</sup>, Xiang Bai<sup>4</sup>, Pengfei Wan<sup>2</sup>, Di Zhang<sup>2</sup> <br> (*Work done during an internship at KwaiVGI, Kuaishou Technology †corresponding authors)
<sup>1</sup>Zhejiang University, <sup>2</sup>Kuaishou Technology, <sup>3</sup>CUHK, <sup>4</sup>HUST.
</div>Important Note: This open-source repository is intended to provide a reference implementation. Due to the difference in the underlying T2V model's performance, the open-source version may not achieve the same performance as the model in our paper. If you'd like to use the best version of ReCamMaster, please upload your video to this link. Additionally, we are working on developing an online trial website. Please stay tuned to updates on the Kling website.
🔥 Updates
- [2025.04.15]: Please feel free to explore our related work, SynCamMaster.
- [2025.04.09]: Release the training and inference code, model checkpoint.
- [2025.03.31]: Release the MultiCamVideo Dataset.
- [2025.03.31]: We have sent the inference results to the first 1000 trial users.
- [2025.03.17]: Release the project page and the try out link.
📖 Introduction
TL;DR: We propose ReCamMaster to re-capture in-the-wild videos with novel camera trajectories, achieved through our proposed simple-and-effective video conditioning scheme. We also release a multi-camera synchronized video dataset rendered with Unreal Engine 5. <br>
https://github.com/user-attachments/assets/52455e86-1adb-458d-bc37-4540a65a60d4
🚀 Trail: Try ReCamMaster with Your Own Videos
Update: We are actively processing the videos uploaded by users. So far, we have sent the inference results to the email addresses of the first 1500 testers. You should receive an email titled "Inference Results of ReCamMaster" from either jianhongbai@zju.edu.cn or cpurgicn@gmail.com. Please also check your spam folder, and let us know if you haven't received the email after a long time. If you enjoyed the videos we created, please consider giving us a star 🌟.
You can try out our ReCamMaster by uploading your own video to this link, which will generate a video with camera movements along a new trajectory. We will send the mp4 file generated by ReCamMaster to your inbox as soon as possible. For camera movement trajectories, we offer 10 basic camera trajectories as follows:
| Index | Basic Trajectory | |-------------------|-----------------------------| | 1 | Pan Right | | 2 | Pan Left | | 3 | Tilt Up | | 4 | Tilt Down | | 5 | Zoom In | | 6 | Zoom Out | | 7 | Translate Up (with rotation) | | 8 | Translate Down (with rotation) | | 9 | Arc Left (with rotation) | | 10 | Arc Right (with rotation) |
If you would like to use ReCamMaster as a baseline and need qualitative or quantitative comparisons, please feel free to drop an email to jianhongbai@zju.edu.cn. We can assist you with batch inference of our model.
⚙️ Code: ReCamMaster + Wan2.1 (Inference & Training)
The model utilized in our paper is an internally developed T2V model, not Wan2.1. Due to company policy restrictions, we are unable to open-source the model used in the paper. Consequently, we migrated ReCamMaster to Wan2.1 to validate the effectiveness of our method. Due to differences in the underlying T2V model, you may not achieve the same results as demonstrated in the demo.
Inference
Step 1: Set up the environment
DiffSynth-Studio requires Rust and Cargo to compile extensions. You can install them using the following command:
curl --proto '=https' --tlsv1.2 -sSf [https://sh.rustup.rs](https://sh.rustup.rs/) | sh
. "$HOME/.cargo/env"
Install DiffSynth-Studio:
git clone https://github.com/KwaiVGI/ReCamMaster.git
cd ReCamMaster
pip install -e .
Step 2: Download the pretrained checkpoints
- Download the pre-trained Wan2.1 models
cd ReCamMaster
python download_wan2.1.py
- Download the pre-trained ReCamMaster checkpoint
Please download from huggingface and place it in models/ReCamMaster/checkpoints.
Step 3: Test the example videos
python inference_recammaster.py --cam_type 1
Step 4: Test your own videos
If you want to test your own videos, you need to prepare your test data following the structure of the example_test_data folder. This includes N mp4 videos, each with at least 81 frames, and a metadata.csv file that stores their paths and corresponding captions. You can refer to the Prompt Extension section in Wan2.1 for guidance on preparing video captions.
python inference_recammaster.py --cam_type 1 --dataset_path path/to/your/data
We provide several preset camera types, as shown in the table below. Additionally, you can generate new trajectories for testing.
| cam_type | Trajectory | |-------------------|-----------------------------| | 1 | Pan Right | | 2 | Pan Left | | 3 | Tilt Up | | 4 | Tilt Down | | 5 | Zoom In | | 6 | Zoom Out | | 7 | Translate Up (with rotation) | | 8 | Translate Down (with rotation) | | 9 | Arc Left (with rotation) | | 10 | Arc Right (with rotation) |
Training
Step 1: Set up the environment
pip install lightning pandas websockets
Step 2: Prepare the training dataset
-
Download the MultiCamVideo dataset.
-
Extract VAE features
CUDA_VISIBLE_DEVICES="0,1,2,3,4,5,6,7" python train_recammaster.py --task data_process --dataset_path path/to/the/MultiCamVideo/Dataset --output_path ./models --text_encoder_path "models/Wan-AI/Wan2.1-T2V-1.3B/models_t5_umt5-xxl-enc-bf16.pth" --vae_path "models/Wan-AI/Wan2.1-T2V-1.3B/Wan2.1_VAE.pth" --tiled --num_frames 81 --height 480 --width 832 --dataloader_num_workers 2
- Generate Captions for Each Video
You can use video caption tools like LLaVA to generate captions for each video and store them in the metadata.csv file.
Step 3: Training
CUDA_VISIBLE_DEVICES="0,1,2,3,4,5,6,7" python train_recammaster.py --task train --dataset_path recam_train_data --output_path ./models/train --dit_path "models/Wan-AI/Wan2.1-T2V-1.3B/diffusion_pytorch_model.safetensors" --steps_per_epoch 8000 --max_epochs 100 --learning_rate 1e-4 --accumulate_grad_batches 1 --use_gradient_checkpointing --dataloader_num_workers 4
We do not explore the optimal set of hyper-parameters and train with a batch size of 1 on each GPU. You may achieve better model performance by adjusting hyper-parameters such as the learning rate and increasing the batch size.
Step 4: Test the model
python inference_recammaster.py --cam_type 1 --ckpt_path path/to/the/checkpoint
📷 Dataset: MultiCamVideo Dataset
1. Dataset Introduction
TL;DR: The MultiCamVideo Dataset is a multi-camera synchronized video dataset rendered using Unreal Engine 5. It includes synchronized multi-camera videos and their corresponding camera trajectories. The MultiCamVideo Dataset can be valuable in fields such as camera-controlled video generation, synchronized video production, and 3D/4D reconstruction. If you are looking for synchronized videos captured with stationary cameras, please explore our SynCamVideo Dataset.
https://github.com/u
