MFuseNet

This is the official implementation code for MFuseNet. For technical details, please refer to :

MFuseNet: Robust Depth Estimation with Learned Multiscopic Fusion Weihao Yuan, Rui Fan, Michael Yu Wang, Qifeng Chen ICRA2020, RA-L [Paper] [Project Page]

Bibtex

If you find this code useful, please consider citing:

@article{yuan2020mfusenet,
  title={MFuseNet: Robust Depth Estimation With Learned Multiscopic Fusion},
  author={Yuan, Weihao and Fan, Rui and Wang, Michael Yu and Chen, Qifeng},
  journal={IEEE Robotics and Automation Letters},
  volume={5},
  number={2},
  pages={3113--3120},
  year={2020},
  publisher={IEEE}
}

Environment Setup
Data Preparation
Train

Environment setup

This code has been tested on Ubuntu 16.04, CUDA 9.0, two GTX 1080 Ti GPUs.

Dependencies:

Python2.7
PyTorch (0.4.0+)
torchvision (0.2.0+)
os, time, numpy, argparse, cv2, matplotlib, PIL

Data Preparation

The input of the network are the cost volumes obtained by cost calculation step in stereo matching algorithms. They can be calculated by block matching, semi-global matching, graph cuts, deep-network-based methods, etc. The default costs are obtained by MC-CNN. Please refer to MC-CNN for computing the cost volumes.

The training data for three-view fusion are organized as follows:

dataset/
    TRAIN/
        scene1/
            view0.png
            view1.png
            view2.png
            disp1.png
            left.bin
            right.bin
    TEST/
    EVAL/

The view0.png, view1.png, view2.png are the color images of the left, center, and right view. The disp1.png is the ground-truth disparity map for view1. The left.bin and right.bin are the cost volumes obtained by MC-CNN for the matching between the left, right view and the center view.

For five-view fusion, there are additional view3.png for the bottom view and view4.png for the top view, and their corresponding cost volumes bottom.bin and top.bin.

Example data are available here.

Train

. train.sh

Pretrained Models

Five views, four costs fusion

Model_5view

Three views, two costs fusion

Model_3view

Results on Middlebury 2006:

| Model | AvgErr | RMS | Bad 0.5 | Bad 1 | Bad 2 | |:-----------:|:----------:|:----------:|:------------:|:-------------:|:-------------:| | Model_3view | 0.250 | 1.036 | 4.08% | 1.83% | 1.15% |

License

Licensed under an MIT license.

MFuseNet

Install / Use

README

MFuseNet

Bibtex

Contents

Environment setup

Data Preparation

Train

Pretrained Models

Five views, four costs fusion

Three views, two costs fusion

License