R3d3
No description available
Install / Use
/learn @SysCV/R3d3README
R3D3: Dense 3D Reconstruction of Dynamic Scenes from Multiple Cameras [ICCV 2023]
Project Page | Paper | Data

Abstract
Dense 3D reconstruction and ego-motion estimation are key challenges in autonomous driving and robotics. Compared to the complex, multi-modal systems deployed today, multi-camera systems provide a simpler, low-cost alternative. However, camera-based 3D reconstruction of complex dynamic scenes has proven extremely difficult, as existing solutions often produce incomplete or incoherent results. We propose R3D3, a multi-camera system for dense 3D reconstruction and ego-motion estimation. Our approach iterates between geometric estimation that exploits spatial-temporal information from multiple cameras, and monocular depth refinement. We integrate multi-camera feature correlation and dense bundle adjustment operators that yield robust geometric depth and pose estimates. To improve reconstruction where geometric depth is unreliable, e.g. for moving objects or low-textured regions, we introduce learnable scene priors via a depth refinement network. We show that this design enables a dense, consistent 3D reconstruction of challenging, dynamic outdoor environments. Consequently, we achieve state-of-the-art dense depth prediction on the DDAD and nuScenes benchmarks.
Getting Started
- Clone the repo using the
--recursiveflag
git clone --recurse-submodules https://github.com/AronDiSc/r3d3.git
cd r3d3
- Creating a new anaconda environment using the provided .yaml file
conda env create --file environment.yaml
conda activate r3d3
- Compile the extensions (takes about 10 minutes)
python setup.py install
Datasets
The datasets should be placed at data/datasets/<dataset>
DDAD
Download the DDAD dataset and place it at
data/datasets/DDAD. We use the masks provided by
SurroundDepth. Place them at data/datasets/DDAD/<scene>/occl_mask/<cam>/mask.png. The DDAD datastructure should look
as follows:
R3D3
├ data
├ datasets
├ DDAD
├ <scene>
├ calibration
└ ....json
├ point_cloud
└ <cam>
└ ....npz
├ occl_mask
└ <cam>
└ ....png
├ rgb
└ <cam>
└ ....png
└ scene_....json
└ ...
└ ...
└ ...
└ ...
nuScenes
Download the nuScenes dataset and place it at
data/datasets/nuScenes. We use the provide self-occlusion
masks. Place them at
data/datasets/nuScenes/mask/<cam>.png. The nuScenes datastructure should look as follows:
R3D3
├ data
├ datasets
├ nuScenes
├ mask
├ CAM_....png
├ samples
├ CAM_...
└ ....jpg
└ LIDAR_TOP
└ ....pcd.bin
├ sweeps
├ CAM_...
└ ....jpg
├ v1.0-trainval
└ ...
└ ...
└ ...
└ ...
└ ...
Models
VKITTI2 Finetuned Feature-Matching
Download the weights for the feature- and context-encoders as well as the GRU from here: r3d3_finetuned.ckpt. Place it at:
R3D3
├ data
├ models
├ r3d3
└ r3d3_finetuned.ckpt
└ ...
└ ...
└ ...
Completion Network
We provide completion network weights for the DDAD and nuScenes datasets.
| Dataset | Abs Rel | Sq Rel | RMSE | delta < 1.25 | Download | |:---------------|:----------:|:----------:|:---------:|:---------------:|:---------------------------------| | DDAD | 0.162 | 3.019 | 11.408 | 0.811 | completion_ddad.ckpt | | nuScenes | 0.253 | 4.759 | 7.150 | 0.729 | completion_nuscenes.ckpt |
Place them at:
R3D3
├ data
├ models
├ completion
├ completion_ddad.ckpt
└ completion_nuscenes.ckpt
└ ...
└ ...
└ ...
Training
Droid-SLAM Finetuning
We finetune the provided droid.pth checkpoint on VKITTI2 by using the Droid-SLAM code-base.
Completion Network
1. Generate Training Data
# DDAD
python evaluate.py \
--config configs/evaluation/dataset_generation/dataset_generation_ddad.yaml \
--r3d3_weights=data/models/r3d3/r3d3_finetuned.ckpt \
--r3d3_image_size 384 640 \
--r3d3_n_warmup=5 \
--r3d3_optm_window=5 \
--r3d3_corr_impl=lowmem \
--r3d3_graph_type=droid_slam \
--training_data_path=./data/datasets/DDAD
# nuScenes
python evaluate.py \
--config configs/evaluation/dataset_generation/dataset_generation_nuscenes.yaml \
--r3d3_weights=data/models/r3d3/r3d3_finetuned.ckpt \
--r3d3_image_size 448 768 \
--r3d3_n_warmup=5 \
--r3d3_optm_window=5 \
--r3d3_corr_impl=lowmem \
--r3d3_graph_type=droid_slam \
--training_data_path=./data/datasets/nuScenes
2. Completion Network Training
# DDAD
python train.py configs/training/depth_completion/r3d3_completion_ddad_stage_1.yaml
python train.py configs/evaluation/depth_completion/r3d3_completion_ddad_inf_depth.yaml --arch.model.checkpoint=<path to stage 1 model>.ckpt
python train.py configs/training/depth_completion/r3d3_completion_ddad_stage_2.yaml --arch.model.checkpoint=<path to stage 1 model>.ckpt
# nuScenes
python train.py configs/training/depth_completion/r3d3_completion_nuscenes_stage_1.yaml
python train.py configs/evaluation/depth_completion/r3d3_completion_nuscenes_inf_depth.yaml --arch.model.checkpoint=<path to stage 1 model>.ckpt
python train.py configs/training/depth_completion/r3d3_completion_nuscenes_stage_2.yaml --arch.model.checkpoint=<path to stage 1 model>.ckpt
Evaluation
# DDAD
python evaluate.py \
--config configs/evaluation/r3d3/r3d3_evaluation_ddad.yaml \
--r3d3_weights data/models/r3d3/r3d3_finetuned.ckpt \
--r3d3_image_size 384 640 \
--r3d3_init_motion_only \
--r3d3_n_edges_max=84
# nuScenes
python evaluate.py \
--config configs/evaluation/r3d3/r3d3_evaluation_nuscenes.yaml \
--r3d3_weights data/models/r3d3/r3d3_finetuned.ckpt \
--r3d3_image_size 448 768 \
--r3d3_init_motion_only \
--r3d3_dt_inter=0 \
--r3d3_n_edges_max=72
Citation
If you find the code helpful in your research or work, please cite the following paper.
@inproceedings{r3d3,
title={R3D3: Dense 3D Reconstruction of Dynamic Scenes from Multiple Cameras},
author={Schmied, Aron and Fischer, Tobias and Danelljan, Martin and Pollefeys, Marc and Yu, Fisher},
booktitle={Proceedings of the IEEE International Conference on Computer Vision},
year={2023}
}
Acknowledgements
- This repository is based on Droid-SLAM.
- The implementation of the completion network is based on Monodepth2.
- The vidar framework is used for training, evaluation and logging results.
Related Skills
node-connect
349.0kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
109.4kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
349.0kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
349.0kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
