ImVoxelNet: Image to Voxels Projection for Monocular and Multi-View General-Purpose 3D Object Detection

News:

:cry: June, 2025. Original repo with 300+ :star: was deleted from SamsungLabs/imvoxelnet. Please follow mmdetection3d for the trained checkpoints. If you have ScanNet checkpoint downloaded from previous version please let us know.
:fire: August, 2022. ImVoxelNet for SUN RGB-D is now supported in mmdetection3d.
:fire: October, 2021. Our paper is accepted at WACV 2022. We simplify 3d neck to make indoor models much faster and accurate. For example, this improves ScanNet mAP by more than 2%. Please find updated configs in configs/imvoxelnet/*_fast.py and models.
:fire: August, 2021. We adapt center sampling for indoor detection. For example, this improves ScanNet mAP by more than 5%. Please find updated configs in configs/imvoxelnet/*_top27.py and models.
:fire: July, 2021. We update ScanNet image preprocessing both here and in mmdetection3d.
:fire: June, 2021. ImVoxelNet for KITTI is now supported in mmdetection3d.

This repository contains implementation of the monocular/multi-view 3D object detector ImVoxelNet, introduced in our paper:

ImVoxelNet: Image to Voxels Projection for Monocular and Multi-View General-Purpose 3D Object Detection Danila Rukhovich, Anna Vorontsova, Anton Konushin Samsung Research https://arxiv.org/abs/2106.01178

Installation

For convenience, we provide a Dockerfile. Alternatively, you can install all required packages manually.

This implementation is based on mmdetection3d framework. Please refer to the original installation guide install.md, replacing open-mmlab/mmdetection3d with saic-vul/imvoxelnet. Also, rotated_iou should be installed with these 4 commands.

Most of the ImVoxelNet-related code locates in the following files: detectors/imvoxelnet.py, necks/imvoxelnet.py, dense_heads/imvoxel_head.py, pipelines/multi_view.py.

Datasets

We support three benchmarks based on the SUN RGB-D dataset.

For the VoteNet benchmark with 10 object categories, you should follow the instructions in sunrgbd.
For the PerspectiveNet benchmark with 30 object categories, the same instructions can be applied; you only need to set dataset argument to sunrgbd_monocular when running create_data.py.
The Total3DUnderstanding benchmark implies detecting objects of 37 categories along with camera pose and room layout estimation. Download the preprocessed data as train.json and val.json and put it to ./data/sunrgbd. Then run:
```
python tools/data_converter/sunrgbd_total.py
```

For ScanNet please follow instructions in scannet. For KITTI and nuScenes, please follow instructions in getting_started.md.

Getting Started

Please see getting_started.md for basic usage examples.

Training

To start training, run dist_train with ImVoxelNet configs:

bash tools/dist_train.sh configs/imvoxelnet/imvoxelnet_kitti.py 8

Testing

Test pre-trained model using dist_test with ImVoxelNet configs:

bash tools/dist_test.sh configs/imvoxelnet/imvoxelnet_kitti.py \
    work_dirs/imvoxelnet_kitti/latest.pth 8 --eval mAP

Visualization

Visualizations can be created with test script. For better visualizations, you may set score_thr in configs to 0.15 or more:

python tools/test.py configs/imvoxelnet/imvoxelnet_kitti.py \
    work_dirs/imvoxelnet_kitti/latest.pth --show \
    --show-dir work_dirs/imvoxelnet_kitti

Models

v2 adds center sampling for indoor scenario. v3 simplifies 3d neck for indoor scenario. Differences are discussed in v2 and v3 preprints.

Imvoxelnet

Install / Use

README

ImVoxelNet: Image to Voxels Projection for Monocular and Multi-View General-Purpose 3D Object Detection

Installation

Datasets

Getting Started

Models