Implicit3DUnderstanding
πΈοΈ [CVPR'21] Official PyTorch code of Holistic 3D Scene Understanding from a Single Image with Implicit Representation. Also includes a PyTorch implementation of the decoder of LDIF (from 3D Shape Representation with Local Deep Implicit Functions).
Install / Use
/learn @chengzhag/Implicit3DUnderstandingREADME
Implicit3DUnderstanding (Im3D) [Project Page (with interactive results)][Paper][Video]
Holistic 3D Scene Understanding from a Single Image with Implicit Representation
Cheng Zhang*, Zhaopeng Cui*, Yinda Zhang*, Shuaicheng Liu, Bing Zeng, Marc Pollefeys
<img src="demo/inputs/1/img.jpg" alt="img.jpg" width="20%" /> <img src="demo/outputs/1/3dbbox.png" alt="3dbbox.png" width="20%" /> <img src="demo/outputs/1/recon.png" alt="recon.png" width="20%" /> <br> <img src="demo/inputs/2/img.jpg" alt="img.jpg" width="20%" /> <img src="demo/outputs/2/3dbbox.png" alt="3dbbox.png" width="20%" /> <img src="demo/outputs/2/recon.png" alt="recon.png" width="20%" /> <br> <img src="demo/inputs/3/img.jpg" alt="img.jpg" width="20%" /> <img src="demo/outputs/3/3dbbox.png" alt="3dbbox.png" width="20%" /> <img src="demo/outputs/3/recon.png" alt="recon.png" width="20%" />

Introduction
This repo contains training, testing, evaluation, visualization code of our CVPR 2021 paper. Specially, the repo contains our PyTorch implementation of the decoder of LDIF, which can be extracted and used in other projects.
Install
Please make sure to install CUDA NVCC on your system first. then run the following:
sudo apt install xvfb ninja-build freeglut3-dev libglew-dev meshlab
conda env create -f environment.yml
conda activate Im3D
python project.py build
When running python project.py build, the script will run external/build_gaps.sh which requires password for sudo privilege for apt-get install.
Please make sure you are running with a user with sudo privilege.
If not, please reach your administrator for installation of these libraries and comment out the corresponding lines then run python project.py build.
Demo
-
Download the pretrained checkpoint and unzip it into
out/total3d/20110611514267/ -
Change current directory to
Implicit3DUnderstanding/and run the demo, which will generate 3D detection result and rendered scene mesh todemo/output/1/CUDA_VISIBLE_DEVICES=0 python main.py out/total3d/20110611514267/out_config.yaml --mode demo --demo_path demo/inputs/1 -
In case you want to run it off screen (for example, with SSH)
CUDA_VISIBLE_DEVICES=0 xvfb-run -a -s "-screen 0 800x600x24" python main.py out/total3d/20110611514267/out_config.yaml --mode demo --demo_path demo/inputs/1 -
If you want to run it interactively, change the last line of demo.py
scene_box.draw3D(if_save=True, save_path = '%s/recon.png' % (save_path))to
scene_box.draw3D(if_save=False, save_path = '%s/recon.png' % (save_path))
Data preparation
We follow Total3DUnderstanding to use SUN-RGBD to train our Scene Graph Convolutional Network (SGCN), and use Pix3D to train our Local Implicit Embedding Network (LIEN) with Local Deep Implicit Functions (LDIF) decoder.
Preprocess SUN-RGBD data
Please follow Total3DUnderstanding to directly download the processed train/test data.
In case you prefer processing by yourself or want to evaluate 3D detection with our code (To ultilize the evaluation code of Coop, we modified the data processing code of Total3DUnderstanding to save parameters for transforming the coordinate system from Total3D back to Coop), please follow these steps:
-
Follow Total3DUnderstanding to download the raw data.
-
According to issue #6 of Total3DUnderstanding, there are a few typos in json files of SUNRGBD dataset, which is mostly solved by the json loader. However, one typo still needs to be fixed by hand. Please find
{"name":""propulsion"tool"}indata/sunrgbd/Dataset/SUNRGBD/kv2/kinect2data/002922_2014-06-26_15-43-16_094959634447_rgbf000089-resize/annotation2Dfinal/index.jsonand remove""propulsion. -
Process the data by
python -m utils.generate_data
Preprocess Pix3D data
We use a different data process pipeline with Total3DUnderstanding. Please follow these steps to generate the train/test data:
-
Download the Pix3D dataset to
data/pix3d/metadata -
Run below to generate the train/test data into 'data/pix3d/ldif'
python utils/preprocess_pix3d4ldif.py
Training and Testing
We use wandb for logging and visualization.
You can register a wandb account and login before training by wandb login.
In case you don't need to visualize the training process, you can put WANDB_MODE=dryrun before the commands bellow.
Thanks to the well-structured code of Total3DUnderstanding, we use the same method to manage parameters of each experiment with configuration files (configs/****.yaml).
We first follow Total3DUnderstanding to pretrain each individual module, then jointly finetune the full model with additional physical violation loss.
Pretraining
We use the pretrained checkpoint of Total3DUnderstanding to load weights for ODN.
Please download and rename the checkpoint to out/pretrained_models/total3d/model_best.pth.
Other modules can be trained then tested with the following steps:
-
Train LEN by:
python main.py configs/layout_estimation.yamlThe pretrained checkpoint can be found at
out/layout_estimation/[start_time]/model_best.pth -
Train LIEN + LDIF by:
python main.py configs/ldif.yamlThe pretrained checkpoint can be found at
out/ldif/[start_time]/model_best.pth(alternatively, you can download the pretrained model here, and unzip it into out/ldif/20101613380518/)The training process is followed with a quick test without ICP and Chamfer distance evaluated. In case you want to align mesh and evaluate the Chamfer distance during testing:
python main.py configs/ldif.yaml --mode trainThe generated object meshes can be found at
out/ldif/[start_time]/visualization -
Replace the checkpoint directories of LEN and LIEN in
configs/total3d_ldif_gcnn.yamlwith the checkpoints trained above, then train SGCN by:python main.py configs/total3d_ldif_gcnn.yamlThe pretrained checkpoint can be found at
out/total3d/[start_time]/model_best.pth
Joint finetune
-
Replace the checkpoint directory in
configs/total3d_ldif_gcnn_joint.yamlwith the one trained in the last step above, then train the full model by:python main.py configs/total3d_ldif_gcnn_joint.yamlThe trained model can be found at
out/total3d/[start_time]/model_best.pth -
The training process is followed with a quick test without scene mesh generated. In case you want to generate the scene mesh during testing (which will cost a day on 1080ti due to the unoptimized interface of LDIF CUDA kernel):
python main.py configs/total3d_ldif_gcnn_joint.yaml --mode trainThe testing resaults can be found at
out/total3d/[start_time]/visualization
Testing
-
The training process above already include a testing process. In case you want to test LIEN+LDIF or full model by yourself:
python main.py out/[ldif/total3d]/[start_time]/out_config.yaml --mode testThe results will be saved to
out/total3d/[start_time]/visualizationand the evaluation metrics will be logged to wandb as run summary. -
Evaluate 3D object detection with our modified matlab script from Coop:
external/cooperative_scene_parsing/evaluation/detections/script_eval_detection.mBefore running the script, please specify the following parameters:
SUNRGBD_path = 'path/to/SUNRGBD'; result_path = 'path/to/experiment/results/visualization'; -
Visualize the i-th 3D scene interacively by
python utils/visualize.py --result_path out/total3d/[start_time]/visualization --sequence_id [i]or save the 3D detection result and rendered scene mesh by
python utils/visualize.py --result_path out/total3d/[start_time]/visualization --sequence_id [i] --save_path []In case you do not have a screen:
python utils/visualize.py --result_path out/total3d/[start_time]/visualization --sequence_id [i] --save_path [] --offscreenIf nothing goes wrong, you should get results like:
<img src="figures/724_bbox.png" alt="camera view 3D bbox" width="20%" /> <img src="figures/724_recon.png" alt="scene reconstruction" width="20%" />
-
Visualize the detection resu
