GaussianPretrain
GussianPretrain for Visual Pre-training in Autonomous Driving, showcasing significant improvements across various 3D perception tasks, including 3D object detection, HD-map construction, and Occupancy prediction.
Install / Use
/learn @Public-BOTs/GaussianPretrainREADME
GaussianPretrain: A Simple Unified 3D Gaussian Representation for Visual Pre-training in Autonomous Driving
Paper | Project Page[TODO]
Shaoqing Xu<sup>1,2</sup>, Fang Li<sup>1,2</sup>, Shengyin Jiang<sup>2,3</sup>, Ziying Song<sup>4</sup>, Zhi-xin Yang<sup>1*</sup>, <br>
<sup>1</sup>University of Macau, <sup>2</sup>Xiaomi EV, <sup>3</sup>BUPT, <sup>4</sup>BJTU
</div>Introduction
💥GussianPretrain introduces 3D Gaussian Splatting technology into vision pre-training task for the first time. Which demonstrates remarkable effectiveness and robustness, achieving significant improvements across various 3D perception tasks, including 3D object detection, HD map reconstruction, and occupancy prediction, with efficiency and lower memory consumption.💥
<p align="center"> <img src="asserts/top.png" alt="pipeline" width="1000"/> </p>Qualitative Rendered Visualization
Image and Video DEMO
<b>Rendered Image Visualization.</b>
<div align="center"> <img src="asserts/render_image.jpg" alt="pipeline" width="1000"/> </div><br/><b>Framework Modules Analysis and Rendered Video Visualization.</b>
https://github.com/user-attachments/assets/3fc08dd1-40f1-4ad5-92c3-9525f3c34ec6
News
-
[2025-03-05] 🚀 We incorporate our method with LiDAR modality and set a new SOTA performance.
-
[2025-02-05] 🚀 Complement rendered visualization images and video for better clear the reconstruction performance of our approach.
-
[2025-01-01] 💥 The experiments setting of UVTR-CS config and weight also released which not achieved in the paper.
-
[2025-01-01] 🚀 The complete code and associated weights have been released. By the way, Happy New Year to everyone! 💥.
-
[2024-11-20] The codebase is initialed. We are diligently preparing for a clean, optimized version. Stay tuned for the complete code release, which is coming soon..
-
[2024-11-19] The paper is publicly available on arXiv.
Overview
💥The architecture of proposed GaussianPretrain. Given multi-view images, we first extract valid mask patches using the mask generator with the LiDAR Depth Guidance strategy. Subsequently, a set of learnable 3D Gaussian anchors is generated using ray-based guidance and conceptualized as volumetric LiDAR points. Finally, the reconstruction signals of RGB, Depth, and Occupancy are decoded based on the predicted Gaussian anchor parameters.
<p align="center"> <img src="asserts/overview.png" alt="pipeline" width="1000"/> </p>Main Results
3D Object Detection

HD-Map Reconstruction

Occupancy Predict

Getting Started
Installation
This project is based on MMDetection3D, which can be constructed as follows.
- Install PyTorch v1.9.1 and mmDetection3D v0.17.3 following the instructions.
- Install the required environment
conda create -n gaussianpretrain python=3.8
conda activate gaussianpretrain
conda install pytorch==1.9.1 torchvision==0.10.1 torchaudio==0.9.1 cudatoolkit=11.1 -c pytorch -c conda-forge
pip install mmcv-full==1.3.11 -f https://download.openmmlab.com/mmcv/dist/cu111/torch1.9/index.html
pip install mmdet==2.14.0 mmsegmentation==0.14.1 tifffile-2021.11.2 numpy==1.19.5 protobuf==3.19.4 scikit-image==0.19.2 pycocotools==2.0.0 nuscenes-devkit==1.0.5 spconv-cu111 gpustat numba scipy pandas matplotlib Cython shapely loguru tqdm future fire yacs jupyterlab scikit-image pybind11 tensorboardX tensorboard easydict pyyaml open3d addict pyquaternion awscli timm typing-extensions==4.7.1
cd GaussianPretrain
python setup.py develop
cd projects/mmdet3d_plugin/ops/diff-gaussian-rasterization
python setup.py develop
Data Preparation
Please follow the instruction of UVTR and PanoOCC to prepare the dataset.
Training & Testing
You can train and eval the model following the instructions. For example:
# run gaussian pretrain on 8 GPUS
bash tools/dist_train.sh projects/mmdet3d_plugin/configs/gaussianpretrain/gp_0.075_convnext.py 8
# run downstream task ft on 8 GPUS
bash tools/dist_train.sh projects/mmdet3d_plugin/configs/gaussianpretrain/uvtr_dn_ft.py 8
# run eval
python tools/test.py $config $ckpt --eval bbox
Weights
1. Object Detection
| Method | Pretrained ckpt | Config | NDS | mAP | Model | |---------------|-----|--------------|-------|------|-------- | UVTR-C+GP | Pretrained |UVTR-C | 47.2 | 41.7 | Google | UVTR-C+GP | Pretrained |UVTR-CS | 50.0 | 42.3 | Google
2. HD-Map Reconstruction
| Method | Pretrained ckpt | Config | mAP | Model |--------------------|----------------|--------|---------|-------- | MapTR-tiny†+GP | Pretrained |MapTR-tiny | 42.42 | Google
3. Occupancy Predict
| Method | Pretrained ckpt | Config | mIoU | Model | |--------|-----------------------------------------------------------------------------------------------------|---------------|---------|---------------- |BEVFormerOCC+GP| Pretrained | BEVFormerOCC | 24.21 | Google |PanoOCC+GP| Pretrained | PanoOCC | 42.62 | Google
TODO
- streampetr version will publish soon.
- Project Page.
Citation
@article{xu2024gaussianpretrain,
title={GaussianPretrain: A Simple Unified 3D Gaussian Representation for Visual Pre-training in Autonomous Driving},
author={Xu, Shaoqing and Li, Fang and Jiang, Shengyin and Song, Ziying and Liu, Li and Yang, Zhi-xin},
journal={arXiv preprint arXiv:2411.12452},
year={2024}
}
Acknowledgement
This project is mainly based on the following codebases. Thanks for their great works!
Related Skills
node-connect
351.4kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
110.7kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
351.4kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
351.4kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
