MonoCD: Monocular 3D Object Detection with Complementary Depths

Longfei Yan, Pei Yan, Shengzhou Xiong, Xuanyu Xiang, Yihua Tan

</h5>

This repository includes an official implementation of the paper MonoCD: Monocular 3D Object Detection with Complementary Depths based on the excellent work MonoFlex. In this work, we first point out the coupling phenomenon that the existing multi-depth predictions have the tendency of predicted depths to consistently overestimate or underestimate the true depth values, which limits the accuracy of combined depth. We propose to increase the complementarity of depths to alleviate this problem.

News

[2026.03] 🔥 The upgraded version MonoCD-E is coming soon! We enhance the proposed complementary depth branch to provide more accurate and stable estimates. Also, it is compatible with the latest DETR detection architecture!
[2024.04] 🎉 MonoCD has been accepted by CVPR 2024! The code has been released.

Installation

git clone https://github.com/dragonfly606/MonoCD.git
cd MonoCD

conda create -n monocd python=3.7
conda activate monocd

# Install PyTorch that matches your local CUDA version.
# We succeeded in both
# torch1.4.0+cu101 (recommended for best reproduction accuracy)
# and torch1.13.0+cu117 (there may be a performance loss if inferring directly on the previous model)
conda install pytorch torchvision cudatoolkit
pip install -r requirements.txt

cd model/backbone/DCNv2
sh make.sh
# If the DCNv2 compilation fails, you can replace it with the version from
# https://github.com/lucasjinreal/DCNv2_latest or
# https://github.com/lbin/DCNv2

cd ../../..
python setup.py develop

Data Preparation

Please download KITTI dataset and organize the data as follows:

#ROOT		
  |training/
    |calib/
    |image_2/
    |label/
    |planes/
    |ImageSets/
  |testing/
    |calib/
    |image_2/
    |ImageSets/

The road planes for Horizon Heatmap training could be downloaded from HERE. Then remember to set the DATA_DIR = "/path/to/your/kitti/" in the config/paths_catalog.py according to your data path.

Get Started

Train

Training with one GPU.

CUDA_VISIBLE_DEVICES=0 python tools/plain_train_net.py --batch_size 8 --config runs/monocd.yaml --output output/exp

Test

The model will be evaluated periodically during training and you can also evaluate an already trained checkpoint with

CUDA_VISIBLE_DEVICES=0 python tools/plain_train_net.py --config runs/monocd.yaml --ckpt YOUR_CKPT  --eval

Model and log

We provide the trained model on KITTI and corresponding logs.

| Models | AP40@Easy | AP40@Mod. | AP40@Hard | Logs/Ckpts | | ---------------------------- | :-------: | :-------: | :-------: | :----------------------------------------------------------: | | MonoFlex | 23.64 | 17.51 | 14.83 | - | | MonoFlex + Ours (paper) | 24.22 | 18.27 | 15.42 | - | | MonoFlex + Ours (reproduced) | 25.99 | 19.12 | 16.03 | log/ckpt |

Citation

If you find our work useful in your research, please consider giving us a star and citing:

@inproceedings{yan2024monocd,
  title={MonoCD: Monocular 3D Object Detection with Complementary Depths},
  author={Yan, Longfei and Yan, Pei and Xiong, Shengzhou and Xiang, Xuanyu and Tan, Yihua},
  booktitle={CVPR},
  pages={10248--10257},
  year={2024}
}

Acknowledgement

This project benefits from awesome works of MonoFlex and MonoGround. Please also consider citing them.

Contact

If you have any questions about this project, please feel free to contact longfeiyan@hust.edu.cn.

MonoCD

Install / Use

README