GenieDrive

[CVPR 2026] "GenieDrive: Towards Physics-Aware Driving World Model with 4D Occupancy Guided Video Generation"

Generate Convert Improve

Install / Use

/learn @Huster-YZY/GenieDrive

About this skill

Quality Score

0/100

README

GenieDrive: Towards Physics-Aware Driving World Model with 4D Occupancy Guided Video Generation

Zhenya Yang<sup>1</sup>, Zhe Liu<sup>1,†</sup>, Yuxiang Lu<sup>1</sup>, Liping Hou<sup>2</sup>, Chenxuan Miao<sup>1</sup>, Siyi Peng<sup>2</sup>, Bailan Feng<sup>2</sup>, Xiang Bai<sup>3</sup>, Hengshuang Zhao<sup>1,✉</sup>

<br> <sup>1</sup> The University of Hong Kong, <sup>2</sup> Huawei Noah's Ark Lab, <sup>3</sup> Huazhong University of Science and Technology <br> † Project leader, ✉ Corresponding author. <br>

📑 [arXiv], ⚙️ [project page], 🤗 [model weights]

<div align="center"> <img src="assets/teaser.jpg" width="100%"> <p><em>Overview of our GenieDrive</em></p> </div> </div>

📢 News

[2025/12/15] We release GenieDrive paper on arXiv. 🔥

2025.12.15: DrivePI paper released! A novel spatial-aware 4D MLLM that serves as a unified Vision-Language-Action (VLA) framework that is also compatible with vision-action (VA) models. 🔥
2025.11.04: Our previous work UniLION has been released. Check out the codebase for unified autonomous driving model with Linear Group RNNs. 🚀
2024.09.26: Our work LION has been accepted by NeurIPS 2024. Visit the codebase for Linear Group RNN for 3D Object Detection. 🚀

📋 TODO List

[ ] Release 4D occupancy forecasting code and model weights.
[ ] Release multi-view video generator code and weights.

📈 Results

Our method achieves a remarkable increase in 4D Occupancy forecasting performance, with a 7.2% increase in mIoU and a 4% increase in IoU. Moreover, our tri-plane VAE compresses occupancy into a latent tri-plane that is only 58% the size used in previous methods, while still maintaining superior reconstruction performance. This compact latent representation also contributes to fast inference (41 FPS) and a minimal parameter count of only 3.47M (including the VAE and prediction module).

<div align="center"> <img src="assets/table_occ.png" width="85%"> <p><em>Performance of 4D Occupancy Forecasting</em></p> </div>

We train three driving video generation models that differ only in video length: S (8 frames, ~0.7 s), M (37 frames, ~3 s), and L (81 frames, ~7 s). Through rollout, the L model can further generate long multi-view driving videos of up to 241 frames (~20 s). GenieDrive consistently outperforms previous occupancy-based methods across all metrics, while also enabling much longer video generation.

<div align="center"> <img src="assets/table_video.png" width="65%"> <p><em>Performance of Multi-View Video Generation</em></p> </div>

📝 Citation

@article{yang2025geniedrive,
  author    = {Yang, Zhenya and Liu, Zhe and Lu, Yuxiang and Hou, Liping and Miao, Chenxuan and Peng, Siyi and Feng, Bailan and Bai, Xiang and Zhao, Hengshuang},
  title     = {GenieDrive: Towards Physics-Aware Driving World Model with 4D Occupancy Guided Video Generation},
  journal   = {arXiv:2512.12751},
  year      = {2025},
}

Acknowledgements

We thank these great works and open-source repositories: I2-World, UniScene, DynamicCity, MMDectection3D and VideoX-Fun.

Related Skills

qqbot-channel

347.2k

QQ 频道管理技能。查询频道列表、子频道、成员、发帖、公告、日程等操作。使用 qqbot_channel_api 工具代理 QQ 开放平台 HTTP 接口，自动处理 Token 鉴权。当用户需要查看频道、管理子频道、查询成员、发布帖子/公告/日程时使用。

docs-writer

100.1k

`docs-writer` skill instructions As an expert technical writer and editor for the Gemini CLI project, you produce accurate, clear, and consistent documentation. When asked to write, edit, or revie

model-usage

347.2k

Use CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.

Design

Campus Second-Hand Trading Platform \- General Design Document (v5.0 \- React Architecture \- Complete Final Version)1\. System Overall Design 1.1. Project Overview This project aims t

Huster-YZY

View profile

View on GitHub

GitHub Stars70

CategoryContent

Updated2d ago

Forks3

Huster-YZY/GenieDrive

Security Score

95/100

Audited on Apr 1, 2026

No findings