Learning Virtual View Selection for 3D Semantic Segmentation

A pytorch implement of The paper “Learning Virtual View Selection for 3D Scene Semantic Segmentation”

Framework Fig

Introduction

we propose a general learning framework for joint 2D-3D scene understanding by selecting informative virtual 2D views of the underlying 3D scene. We feed both the 3D geometry and the generated virtual 2D views into any joint 2D-3D-input or pure 3D-input based deep neural models for improving 3D scene understanding.

We have validated our proposed framework for various joint 2D-3D-input or pure 3D-input based deep neural models on ScanNet v2 and S3DIS, and the results demonstrate that our method obtains a consistent gain over baseline models and achieves new top accuracy for scene semantic segmentation.

The repo contains 3 subprojects to show the application of our work on different existing segmentation methods.

Environment

Requirements
- GPU RTX3090 (also compatible with other version of NVIDIA GTX/RTX GPU)
- Python 3.7
- Pytorch 1.4.0
- torchvision 0.5.0
- CUDA 10.0
- cudatoolkit 10.0.130
- cuDNN 7.6.5
Installation

# Torch
$ pip install torch==1.4.0+cu100 torchvision==0.5.0+cu100 -f https://download.pytorch.org/whl/torch_stable.html
# MinkowskiEngine 0.4.1
$ conda install numpy openblas
$ cd bpnet
$ git clone https://github.com/StanfordVL/MinkowskiEngine.git
$ cd MinkowskiEngine
$ git checkout f1a419cc5792562a06df9e1da686b7ce8f3bb5ad
$ python setup.py install
# Others
$ pip install imageio==2.8.0 opencv-python==4.2.0.32 pillow==7.0.0 pyyaml==5.3 scipy==1.4.1 sharedarray==3.2.0 tensorboardx==2.0 tqdm==4.42.1
$ cd VMnet
$ git clone https://github.com/huaaaliu/RGBX_Semantic_Segmentation.git

install open3d following instructions in Open3D – A Modern Library for 3D Data Processing
For each subproject, refer to their requirements.txt or environment.yml for more details

Prepare data

Download the dataset Scannet and S3DIS on their official website. http://www.scan-net.org/ & http://buildingparser.stanford.edu/dataset.html
use 2D data processing code /bpnet/prepare_2d_data.py for preprocessing 2D images of scannet dataset
use 3D data processing code /bpnet/dataset/preprocess_3d_scannet.py to prepare 3d point clouds
To train the base model with virtual views, use bpnet/render/generate_virtualviews.py to generate virtual views with the same camera pose with original images.

Training

BPNet

download the initial model and put it into /bpnet/initmodel from 'resnet34': 'https://download.pytorch.org/models/resnet34-333f7ec4.pth'
train the BPNet baseline with generated virtual views, use tools/train.sh and change data_root in /config/scannet/bpnet_5cm.yaml to your dataset root generated in bpnet/render/generate_virtualviews.py
Train the virtual view selection framework:

bpnet/tool/train.py --config=config/scannet/VVS_bpnet.yaml

MVPNet

pretrain the 2D network for MVPNet
```
python mvpnet/train_2d.py --cfg configs/scannet/unet_resnet34.yaml
```
you need also to change data_root of the images to your generated virtual images

train MVPNet with virtual views

python mvpnet/train_mvpnet_3d.py --cfg configs/scannet/mvpnet_3d_unet_resnet34_pn2ssg.yaml

VMNet

pre-train the 2D semantic segmentation network with generated virtual views. The virtual views need to be orgnized as

<datasets>
|-- Scannet_virtual
    |-- <RGBFolder>
        |-- <name1>.jpg
    |-- <DepthFolder>
        |-- <name1>.jpg
        ...
    |-- <LabelFolder>
        |-- <name1>.<LabelFormat>
        ...
    |-- train.txt
    |-- test.txt

Pretrain weights:

Download the pretrained segformer here pretrained segformer.
Config

Edit config file in configs.py, including dataset and network settings.

Run the training

$ CUDA_VISIBLE_DEVICES="GPU IDs" python -m torch.distributed.launch --nproc_per_node="GPU numbers you want to use" train.py

pre-train the baseline with Scannet Dataset

CUDA_VISIBLE_DEVICES=0 python run.py --train --exp_name name_you_want --data_path path/to/processed_data

Inference

BPNet

bpnet/tool/test.py --config=config/scannet/VVS_bpnet.yaml

MVPNet

python mvpnet/test.py --cfg configs/scannet/VVS_mvpnet.yaml

VMNet

CUDA_VISIBLE_DEVICES=0 python /VMnet/run.py --val --exp_name val --data_path dataset/preprocessed

Visualization

you can use the following to see the visualized certainty map and segmentation results

<subproject>/scripts/vis_certainty.py
<subproject>/scripts/vis_results.py

checkpoints

you can download the pre-trained checkpoints and evaluate models on the validation set.

VirtualViewSelection

Install / Use

README

Learning Virtual View Selection for 3D Semantic Segmentation

Introduction

Environment

Prepare data

Training

BPNet

MVPNet

VMNet

Inference

BPNet

MVPNet

VMNet

Visualization

checkpoints