UMGen
Code for CVPR2025 paper: Generating Multimodal Driving Scenes via Next-Scene Prediction
Install / Use
/learn @YanhaoWu/UMGenREADME
Generating Multimodal Driving Scenes via Next-Scene Prediction
Yanhao Wu<sup>1,2</sup>, Haoyang Zhang<sup>2</sup>, Tianwei Lin<sup>2</sup>, Lichao Huang<sup>2</sup>,
Shujie Luo<sup>2</sup>, Rui Wu<sup>2</sup>, Congpei Qiu<sup>1</sup>, Wei Ke<sup>1</sup>, Tong Zhang<sup>3, 4</sup>,
<sup>1</sup> Xi'an Jiaotong University, <sup>2</sup> Horizon Robotics, <sup>3</sup> EPFL, <sup>4</sup> University of Chinese Academy of Sciences
Accepted to CVPR 2025
</div>🌟 What is UMGen?
UMGen generates multimodal driving scenes, where each scene integrates:
Ego-vehicle actions, maps, traffic agents, and images.
🎬 Autoregressive Scene Generation
<p style="margin-bottom:4px;"> <strong> All visual elements</strong> in the video are generated by UMGen. </p> <!-- <video width="640" height="360" controls style="display:block; margin-top:-60px;"> <source src="assets/Teaser_formated.gif" type="video/mp4"> </video> --> <!-- [Teaser_Video](assets/Teaser_formated.gif) -->https://github.com/user-attachments/assets/afe62434-1a9e-44dc-b1bd-b67d48e1b693
🤖 User-Specified Scenario Generation
UMGen also supports user-specified scenario generation.
In this video, we control the agent to simulate a cut-in maneuver scenario.
https://github.com/user-attachments/assets/a3224d85-08df-4e36-a47d-f3e88f2b7ad6
📎 More Information
For more videos and details, please refer to our and
🚀 Quick Start
Set up a new virtual environment
conda create -n UMGen python=3.8 -y
conda activate UMGen
Install dependency packpages
UMGen_path="path/to/UMGen"
cd ${UMGen_path}
pip3 install --upgrade pip
pip install torch==1.13.1+cu116 torchvision==0.14.1+cu116 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu116
pip3 install -r requirements.txt
Prepare the data
Download the tokenized data and pretrained weights from https://drive.google.com/drive/folders/1rJEVxWNk4MH_FPdqUMgdjV_PHwKJMS-3?usp=sharing
The directory structure should be:
UMGen/
├── data
│ ├── controlled_scenes/
| ├── XX
│ ├── tokenized_origin_scenes/
│ ├── XX
| ├── weights/
│ ├── image_var.tar
| ├── map_vae.ckpt
| ├── UMGen_Large.pt
└── projects/
⚙️ Inference Usage
🎛️ Infer Future Frames Freely
Generate future frames automatically without any external control signals.
python projects/tools/evaluate.py --infer_task video --set_num_new_frames 30
🕹️Infer Future Frames with Control
Generate future frames under specific control constraints, such as predefined trajectories or object behavior control.
python projects/tools/evaluate.py --infer_task control --set_num_new_frames 30
🧩 To-Do List
- [ ] Release more tokenized scene data
- [ ] Release the code for obtaining scene tokens using the VAE models
- [ ] Release the diffusion code to enhance the videos
📬 Contact
For any questions or collaborations, feel free to contact me : ) 📧 wuyanhao@stu.xjtu.edu.cn
