UCPE
π· [CVPR'26] Camera-controlled text-to-video generation, now with intrinsics, distortion and orientation control!
Install / Use
/learn @chengzhag/UCPEREADME
π· UCPE
<p align="center"> <h1 align="center">Unified Camera Positional Encoding for Controlled Video Generation</h1> <p align="center"> <p align="center"> <a href="https://chengzhag.github.io/">Cheng Zhang</a><sup>1</sup><sup>,2</sup> Β· <a href="https://leeby68.github.io/">Boying Li</a><sup>1</sup> Β· <a href="https://www.linkedin.com/in/meng-wei-66687a105/?originalSubdomain=au">Meng Wei</a><sup>1</sup> Β· <a href="https://yanpei.me/">Yan-Pei Cao</a><sup>3</sup> Β· <a href="https://www.monash.edu/mada/architecture/people/camilo-cruz-gambardella/">Camilo Cruz Gambardella</a><sup>1,2</sup> Β· <a href="https://research.monash.edu/en/persons/dinh-phung/">Dinh Phung</a><sup>1</sup> Β· <a href="https://jianfei-cai.github.io/">Jianfei Cai</a><sup>1</sup><br> <sup>1</sup>Monash University <sup>2</sup>Building 4.0 CRC <sup>3</sup>VAST </p> <h2 align="center"><a href="https://arxiv.org/abs/2512.07237">Paper</a> | <a href="https://chengzhag.github.io/publication/ucpe/">Project Page</a> | <a href="https://youtu.be/rMX7gxH8jBM">Video</a> | <a href="https://huggingface.co/datasets/chengzhag/PanShot">Hugging Face</a></h2> </p>
*Our UCPE introduces a geometry-consistent alternative to PlΓΌcker rays as one of the core contributions, enabling better generalization in Transformers. We hope to inspire future research on camera-aware architectures.
π’ Updates
- [2026.03.19] π§ Fixed a bug in PlΓΌcker encoding (thanks to @fengq1a0's issue #5).
- [2026.02.21] π UCPE accepted to CVPR 2026
- [2026.02.04] π PanShot Dataset And Curation Code (controllable camera data synthesized from PanFlow)
- [2026.02.04] π― Full Training, Evaluation, Visualization Code
- [2025.12.07] β‘ Quick Demo code released
π TLDR
π₯ Camera-controlled text-to-video generation, now with intrinsics, distortion and orientation control!
<p align="center"> <img src="images/cameras.png" alt="Camera lenses" height="120px"> <img src="images/orientation.png" alt="Orientation control" height="140px"> </p>π· UCPE integrates Relative Ray Encodingβwhich delivers significantly better generalization than PlΓΌcker across diverse camera motion, intrinsics and lens distortionsβwith Absolute Orientation Encoding for controllable pitch and roll, enabling a unified camera representation for Transformers and state-of-the-art camera-controlled video generation with just 0.5% extra parameters (35.5M over the 7.3B parameters of the base model)
<p align="center"> <img src="images/video-ucpe.gif" alt="UCPE" style="max-height:480px; width:auto;"> </p>π οΈ Installation
conda create -n UCPE python=3.11 -y
conda activate UCPE
conda install -c conda-forge "ffmpeg<8" libiconv libgl -y
pip install -r requirements.txt
pip install --no-build-isolation --no-cache-dir flash-attn==2.8.0.post2
pip install -e .
cd thirdparty/equilib
pip install -e .
We use wandb to log and visualize the training process. You can create an account then login to wandb by running the following command:
wandb login
<details>
<summary>Below are installations for tools used in evaluation and dataset processing
that can be skipped if you do not need these tools.</summary>
cd ../GeoCalib
pip install -e .
pip install -e siclib
cd ../UniK3D
pip install -e . --extra-index-url https://download.pytorch.org/whl/cu121
cd ../Q-Align
conda create -n qalign python=3.9 -y
conda activate qalign
pip install -e .
pip install jsonlines "numpy<2" protobuf pydantic-settings
cd ../vipe
conda env create -f envs/base.yml
conda activate vipe
pip install -r envs/requirements.txt
pip install --no-build-isolation -e .
</details>
<br>
β‘ Quick Demo
Download our finetuned weights from OneDrive and put it in logs/ folder. Then run:
bash scripts/demo.sh
The generated videos will be saved in logs/6wodf04s/demo, examples shown below:
demo/lens.json: Our Relative Ray Encoding not only generalizes to but also enables controllability over a wide range of camera intrinsics and lens distortions.
demo/pose.json: The geometry-consistent design of Relative Ray Encoding further allows strong generalization and controllability over diverse camera motions.
demo/teaser.json: Our Absolute Orientation Encoding further eliminate the ambiguity in pitch and roll in previous T2V methods, enabling precise control over initial camera orientation.
π PanShot Dataset
Please download the PanShot dataset from Hugging Face to data/UCPE/PanShot-7z by:
huggingface-cli download chengzhag/PanShot --repo-type dataset --local-dir data/UCPE/PanShot-7z
Then extract the dataset by:
cd data/UCPE/PanShot-7z
bash extract_panshot.sh
cd ../../..
The extracted dataset will be saved in data/UCPE/PanShot.
Please then copy the other files to form the following folder structure:
βββ captioned-test.jsonl
βββ captioned-train.jsonl
βββ max_rotation-test.json
βββ meta-test
βββ meta-train
βββ pose-test
βββ pose-train
βββ videos-test
βββ videos-train
<details>
<summary>If you want to go through the dataset curation process, Please follow these three steps.</summary>
CameraBench
Download the dataset from multiple sources:
cd data
huggingface-cli download --repo-type dataset syCen/CameraBench --local-dir CameraBench
cd CameraBench
huggingface-cli download --repo-type dataset syCen/Videos4CameraBnech --local-dir data/videos
wget https://huggingface.co/datasets/chancharikm/cambench_train_videos/resolve/main/videos.zip
unzip videos.zip -d videos
cd ../..
Process the dataset:
conda activate UCPE
python tools/process_camerabench.py # set split = "train" and split = "test"
conda activate vipe
cd thirdparty/vipe
python thirdparty/vipe/run.py pipeline=default streams=raw_mp4_stream streams.base_path=data/UCPE/CameraBench/videos/ pipeline.output.path=data/UCPE/CameraBench/vipe/ pipeline.output.save_artifacts=true pipeline.post.depth_align_model=null
conda activate UCPE
python tools/geocalib_camerabench.py
python tools/filter_camerabench.py
Processed dataset will be saved in data/UCPE/CameraBench.
PanFlow
Download the pretrained model PanoFlow(RAFT)-wo-CFE.pth of Panoflow at weiyun, then put it in models/PanoFlow folder.
Our PanShot dataset is built upon PanFlow dataset's videos and slam_poses. Please download follow their instructions on how to download the full videos and download their meta and slam_poses files following Full Dataset.
Then process the dataset with:
conda activate UCPE
python tools/filter_panflow.py
conda activate qalign
python tools/score_panflow.py
conda activate UCPE
python tools/align_panflow.py # set split = "train" and split = "test"
python tools/match_panflow.py # set split = "train" and split = "test"
python tools/normalize_panflow.py # set split = "train" and split = "test"
PanShot
Export your YouTube cookies to ~/.config/cookies.txt in Netscape format for 4k download. Then download and process the dataset:
conda activate UCPE
python tools/process_panshot.py # set split = "train" and split = "test"
python tools/caption_panshot.py # set split = "train" and split = "test"
</details>
<br>
π‘ RealEstate10k Dataset
We use RealEstate10k Dataset for evaluation, so only poses and captions are needed. Plesae download the RealEstate10k poses from the official website (RealEstate10K.tgz) and unpack it to data/RealEstate10k folder. Then download the captions from CameraCtrl (train and test)
The final folder structure should be like this:
βββ captions
β βββ test.json
β βββ train.json
βββ pose_files
β βββ test
β βββ train
βββ traj_normalization.txt
π― Training and Evaluation
Prepare the latent cache and train the model with:
python src/cache.py
bash scripts/train.sh
We used 8 A800 GPUs for training, which takes about 1 day. You'll get a WANDB_RUN_ID (e.g., 6wodf04s) after starting the training. The logs will be synced to your wandb account and the checkpoints will be saved in logs/<WANDB_RUN_ID>/checkpoints/. You can use other commented settings in scripts/train.sh for ablation studies and baselines.
For evaluation, first download the pretrained model i3d_pretrained_400.pt in common_metrics_on_video_quality, then put it in models/FVD folder. Evaluate results with:
bash scripts/evaluate.sh
Please change the `WANDB_RUN_
