COME
Adding Scene-Centric Forecasting Control to Occupancy World Model
Install / Use
/learn @synsin0/COMEREADME
COME: Adding Scene-Centric Forecasting Control to Occupancy World Model
Demo Videos
The comparison of ground-truth, DOME generation with official checkpoint and COME. The task setting is to use 4-frame 3D-Occ sequences as input and predict the next 6-frame (3-s prediction) sequences.
https://github.com/user-attachments/assets/f95890fb-ab5a-4f26-b9ec-3b5e44f45a99
The comparison of ground-truth, DOME generation with reproduced checkpoint and COME. The task setting is to use 4-frame 3D-Occ sequences as input and predict the next 16-frame (8-s prediction) sequences.
https://github.com/user-attachments/assets/4d1ec897-578c-469a-a1dd-a9b74f7eb3cf
The COME generation with BEV layouts. The task setting is to use 2-frame 3D-Occ sequences and 8-frame BEV layouts as input and predict the next 6-frame (3-s) sequences.
https://github.com/user-attachments/assets/e511e5df-71ad-42df-beab-c2725d0aad92
Overview
COME = Forecasting Guided Generation

Results

🚀 Setup
environment setup
conda env create --file environment.yml
pip install einops tabulate
cd occforecasting
python setup.py develop
cd ..
data preparation
-
Create soft link from
data/nuscenesto your_nuscenes_path -
Prepare the gts semantic occupancy introduced in Occ3d
-
Download generated train/val pickle files from OccWorld or DOME.
-
Prepare the train/val pickle files for scene-centric forecasting.
python -m occforecasting.datasets.nusc_occ3d_dataset
The dataset should be organized as follows:
.
└── data/
├── nuscenes # downloaded from www.nuscenes.org/
│ ├── lidarseg
│ ├── maps
│ ├── samples
│ ├── sweeps
│ ├── v1.0-trainval
│ └── gts # download from Occ3d
├── nuscenes_infos_train_temporal_v3_scene.pkl
└── nuscenes_infos_val_temporal_v3_scene.pkl
├── nuscenes_train_occ3d_infos.pkl
└── nuscenes_val_occ3d_infos.pkl
The four pickle files can also be downloaded in infos.
optinal inputs
For testing under different conditions, more inputs are needed.
-
motion planning results with yaw angles from BEVPlanner. Please put the json file on the project root directory. We simply add a yaw regression branch on BEV-Planner Project, Thanks for their great work.
-
BEV layouts for training and validation sets at 2Hz labels. Please unzip files and put them in './data/step2'. The pre-processing script is from UniScene, Thanks for their great work.
-
3D occupancy prediction results from BEVDet and EFFOcc. Please unzip files and put them in './data/occpreds'. Thanks for their open-source checkpoints.
-
AE evaluation protocol from UniScene, please download AE_checkpoint for request and put in './ckpts/'.
Model Zoos
We recommend to download checkpoints with folders under './work_dir'.
| Task Setting | Inputs | Method | Config | Checkpoint | | --- | --- | --- | --- | --- | | Input-4frame-Output-6frame | 3DOcc + GT Traj | Stage1-COME-World Model | Config | CKPT | Input-4frame-Output-6frame | 3DOcc + GT Traj| Stage2-COME-Scene-Centric-Forecasting | Config | CKPT | Input-4frame-Output-6frame | 3DOcc + GT Traj| Stage3-COME-ControlNet | Config | CKPT | Input-4frame-Output-6frame | 3DOcc + Pred Traj| Stage3-COME-ControlNet | Config | Same As Above | Input-4frame-Output-6frame | BEVDet + Pred Traj| Stage3-COME-ControlNet | Config | Same As Above | Input-4frame-Output-6frame | BEVDet + GT Traj| Stage3-COME-ControlNet | Config | Same As Above | Input-4frame-Output-6frame | EFFOcc + Pred Traj| Stage3-COME-ControlNet | Config | Same As Above | Input-4frame-Output-6frame | EFFOcc + GT Traj| Stage3-COME-ControlNet | Config | Same As Above | Input-4frame-Output-16frame | 3DOcc + GT Traj | Stage1-COME-World Model | Config | CKPT | Input-4frame-Output-16frame | 3DOcc + GT Traj| Stage2-COME-Scene-Centric-Forecasting | Config | CKPT | Input-4frame-Output-16frame | 3DOcc + GT Traj| Stage3-COME-ControlNet | Config | CKPT | Input-2frame-Output-6frame | 3DOcc + GT Traj + BEV Layouts | Stage1-COME-World Model | Config | CKPT | Input-2frame-Output-6frame | 3DOcc + GT Traj + BEV Layouts | Stage2-COME-Scene-Centric-Forecasting | Config | CKPT | Input-2frame-Output-6frame | 3DOcc + GT Traj + BEV Layouts | Stage3-COME-ControlNet | Config | CKPT | Input-4frame-Output-6frame | 3DOcc + GT Traj | Stage1-COME-Small-World Model | Config | CKPT | Input-4frame-Output-6frame | 3DOcc + GT Traj| Stage2-COME-Scene-Centric-Forecasting | Config | Same As Above | Input-4frame-Output-6frame | 3DOcc + GT Traj| Stage3-COME-Small-ControlNet | Config | CKPT
🏃 Run the code
OCC-VAE
By default, we use the VAE checkpoint provided by DOME, thanks for their greak work.
# train
python tools/train_vae.py --py-config ./configs/train_occvae.py --work-dir ./work_dir/occ_vae
# eval
python tools/train_vae.py --py-config ./configs/train_occvae.py --work-dir ./work_dir/occ_vae --resume-from ckpts/occvae_latest.pth
# visualize
python tools/visualize_demo_vae.py \
--py-config ./configs/train_occvae.py \
--work-dir ./work_dir/occ_vae \
--resume-from ckpts/occvae_latest.pth \
--export_pcd \
--skip_gt
Scene-Centric Forecasting
cd occforecasting
# train
bash train.sh occforecasting/configs/unet/unet_aligned_past2s_future_3s.py
# eval
bash test.sh occforecasting/configs/unet/unet_aligned_past2s_future_3s.py
COME World Model
# train
python tools/train_diffusion.sh --py-config ./configs/train_dome_v2.py --work-dir ./work_dir/dome_v2
# eval
python tools/eval_metric.py --py-config ./configs/train_dome_v2.py --work-dir ./work_dir/dome_v2 --resume-from ./work_dir/dome_v2/best_miou.pth --vae-resume-from ckpts/occvae_latest.pth
# visualize
python tools/visualize_demo.py --py-config ./configs/train_dome_v2.py --work-dir ./work_dir/dome_v2 --resume-from ./work_dir/dome_v2/best_miou.pth --vae-resume-from ckpts/occvae_latest.pth
COME ControlNet
# train
python tools/train_diffusion_control_ddp.py --py-config configs/train_dome_controlnet_mask_invisible_v2.py --work-dir work_dir/train_dome_controlnet_mask_invisible_v2
# eval
python tools/test_diffusion_control.py --py-config configs/train_dome_controlnet_mask_invisible_v2.py --work-dir work_dir/train_dome_controlnet_mask_invisible_v2
# visualize
python tools/visualize_demo_control_mask_invisible.py --py-config configs/train_dome_controlnet_mask_invisible_v2.py --work-dir work_dir/train_dome_controlnet_mask_invisible_v2 --vae-resume-from ckpts/occvae_latest.pth --skip_gt
Acknoweldgements
This project is built on top of DOME and OccWorld. Thanks for the excellent open-source works!
Related Skills
node-connect
347.0kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
107.8kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
347.0kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
347.0kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
