LiDARCrafter
[AAAI 2026 Oral] LiDARCrafter: Dynamic 4D World Modeling from LiDAR Sequences
Install / Use
/learn @worldbench/LiDARCrafterREADME
<img src="images/teaser.png" alt="Teaser" width="100%"> | | :-: |
In this work, we introduce LiDARCrafter, a unified framework for 4D LiDAR generation and editing. We contribute:
- The first 4D generative world model dedicated to LiDAR data, with superior controllability and spatiotemporal consistency.
- We introduce a tri-branch 4D layout conditioned pipeline that turns language into an editable 4D layout and uses it to guide temporally stable LiDAR synthesis.
- We propose a comprehensive evaluation suite for LiDAR sequence generation, encompassing scene-level, object-level, and sequence-level metrics.
- We demonstrate best single-frame and sequence-level LiDAR point cloud generation performance on nuScenes, with improved foreground quality over existing methods.
:books: Citation
If you find this work helpful for your research, please kindly consider citing our paper:
@inproceedings{liang2026lidarcrafter,
title = {{LiDARCrafter}: Dynamic {4D} World Modeling from {LiDAR} Sequences},
author = {Ao Liang and Youquan Liu and Yu Yang and Dongyue Lu and Linfeng Li and Lingdong Kong and Huaici Zhao and Wei Tsang Ooi},
booktitle = {Proceedings of the AAAI Conference on Artificial Intelligence (AAAI)},
volume = {40},
year = {2026},
}
Updates
- [11.2025] - LiDARCrafter has been accepted to AAAI 2026 for Oral Presentation. :tada:
- [10.2025] - We will soon start organizing the code. All pretrained weights for evaluation can be found at Hugging Face.
- [08.2025] - The technical report of LiDARCrafter is available on arXiv.
Outline
- Updates
- Outline
- :gear: Installation
- :hotsprings: Data Preparation
- :rocket: Getting Started
- :wrench: Generation Framework
- :snake: Model Zoo
- :memo: TODO List
- License
- Acknowledgements
:gear: Installation
Please configure your environment according to the version information in environment.yml.
:hotsprings: Data Preparation
- Create dataset: same as DrivingDiffusion
ln -s ${ROOT_DATA_PATH} ./data/nuscenes
Run bash scripts/create_data.sh for generate:
-
info with track and state
-
Updated pkl with scene graph
-
CLIP feature of scene graph
The file-tree of data is like:
data
├── clips
│ └── nuscenes
│ ├── obj_text_feat.pkl
│ ├── train
│ └── val
├── infos
│ ├── needed_5_framed_token.pkl
│ ├── nuscenes_dbinfos_10sweeps_withvelo.pkl
│ ├── nuscenes_infos_10sweeps_train.pkl
│ ├── nuscenes_infos_10sweeps_val.pkl
│ ├── nuscenes_infos_lidargen_train.pkl
│ ├── nuscenes_infos_lidargen_val.pkl
│ ├── nuscenes_infos_train.pkl
│ ├── nuscenes_infos_val.pkl
│ ├── nuscenes_object_classification_train.pkl
│ └── nuscenes_object_classification_val.pkl
└── nuscenes
:rocket: Getting Started
Evaluation
- Train classification model
python train/train_classification_pointmlp.py
- Train uncertainty model
python train/train_uncertainty_glenet.py
For each generated 1w model
- Extract foreground samples
python evaluation/extract_foreground_samples.py --model ori
:wrench: Generation Framework
Overall Framework
<img src="images/framework.png" alt="Framework" width="100%"> | | :-: |
4D Layout Generation
<img src="images/gen-4d-layout.png" alt="Example" width="100%"> | | :-: |
Single-Frame Generation
<img src="images/gen-single-frame.png" alt="Example" width="100%"> | | :-: |
:snake: Model Zoo
To be updated.
:memo: TODO List
- [x] Initial release. 🚀
- [x] Release the training code.
- [x] Release the inference code.
- [x] Release the evaluation code.
- [ ] Refine the Readme.md
License
This work is under the <a rel="license" href="https://www.apache.org/licenses/LICENSE-2.0">Apache License Version 2.0</a>, while some specific implementations in this codebase might be under other licenses. Kindly refer to LICENSE.md for a more careful check, if you are using our code for commercial matters.
Acknowledgements
This work is developed based on the MMDetection3D codebase.
<img src="https://github.com/open-mmlab/mmdetection3d/blob/main/resources/mmdet3d-logo.png" width="31%"/><br> MMDetection3D is an open-source toolbox based on PyTorch, towards the next-generation platform for general 3D perception. It is a part of the OpenMMLab project developed by MMLab.
Part of the benchmarked models are from the OpenPCDet and 3DTrans projects.
Related Projects
| :sunglasses: Awesome | Projects | |:-:|:-| | | | <img width="95px" src="https://github.com/ldkong1205/ldkong1205/blob/master/Images/worldbench_survey.webp"> | 3D and 4D World Modeling: A Survey<br>[GitHub Repo] - [Project Page] - [Paper] | | <img width="95px" src="https://github.com/ldkong1205/ldkong1205/blob/master/Images/worldlens.png"> | WorldLens: Full-Spectrum Evaluations of Driving World Models in Real World<br>[GitHub Repo] - [Project Page] - [Paper] | | <img width="95px" src="https://github.com/ldkong1205/ldkong1205/blob/master/Images/3eed.png"> | 3EED: Ground Everything Everywhere in 3D<br>[GitHub Repo] - [Project Page] - [Paper] | | <img width="95px" src="https://github.com/ldkong1205/ldkong1205/blob/master/Images/drivebench.png"> | Are VLMs Ready for Autonomous Driving? A Study from Reliability, Data & Metric Perspectives<br>[GitHub Repo] - [Project Page] - [Paper] | | <img width="95px" src="https://github.com/ldkong1205/ldkong1205/blob/master/Images/pi3det.png"> | Perspective-Invariant 3D Object Detection<br>[GitHub Repo] - [Project Page] - [Paper] | | <img width="95px" src="https://github.com/ldkong1205/ldkong1205/blob/master/Images/dynamiccity.webp"> | DynamicCity: Large-Scale 4D Occupancy Generation from Dynamic Scenes<br>[GitHub Repo] - [Project Page] - [Paper] | | |
Related Skills
node-connect
343.3kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
92.1kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
343.3kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
343.3kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
