🚀MonoSKD

💥 Introduction

This is the PyTorch implementation of the paper MonoSKD: General Distillation Framework for Monocular 3D Object Detection via Spearman Correlation Coefficient , In ECAI'23, Sen Wang and Jin Zheng. [📕Paper]

🔥 Abstract

Monocular 3D object detection is an inherently ill-posed problem, as it is challenging to predict accurate 3D localization from a single image. Existing monocular 3D detection knowledge distillation methods usually project the LiDAR onto the image plane and train the teacher network accordingly. Transferring LiDAR-based model knowledge to RGB-based models is more complex, so a general distillation strategy is needed. To alleviate cross-modal problem, we propose MonoSKD, a novel Knowledge Distillation framework for Monocular 3D detection based on Spearman correlation coefficient, to learn the relative correlation between cross-modal features. Considering the large gap between these features, strict alignment of features may mislead the training, so we propose a looser Spearman loss. Furthermore, by selecting appropriate distillation locations and removing redundant modules, our scheme saves more GPU resources and trains faster than existing methods. Extensive experiments are performed to verify the effectiveness of our framework on the challenging KITTI 3D object detection benchmark. Our method achieves state-of-the-art performance until submission with no additional inference computational cost. Our code will be made public once accepted.

🍇 Updates

2023/12/08 Release the checkpoint of teacher and student.
2023/10/21 Release the checkpoint of our distilled DID-M3D.
2023/07/22 Release the codes of our MonoSKD framework.

🍰 Installation

Installation Steps

a. Clone this repository.

b. Install the dependent libraries as follows:

Install the dependent python libraries:

pip install torch==1.12.0 torchvision==0.13.0 pyyaml scikit-image opencv-python numba tqdm torchsort

We test this repository on Nvidia 3090 GPUs and Ubuntu 18.04. You can also follow the install instructions in GUPNet (This respository is based on it) to perform experiments with lower PyTorch/GPU versions.

📍 Getting Started

Dataset Preparation

Please download the official KITTI 3D object detection dataset and organize the downloaded files as follows:

this repo
├── data
│   │── KITTI3D
|   │   │── training
|   │   │   ├──calib & label_2 & image_2 & depth_dense
|   │   │── testing
|   │   │   ├──calib & image_2
├── config
├── ...

You can also choose to link your KITTI dataset path by

KITTI_DATA_PATH=~/data/kitti_object
ln -s $KITTI_DATA_PATH ./data/KITTI3D

To ease the usage, the pre-generated dense depth files at: Google Drive

Training & Testing

Test and evaluate the pretrained models

CUDA_VISIBLE_DEVICES=0 python tools/train_val.py --config config/monoskd.yaml -e

Train a model

CUDA_VISIBLE_DEVICES=0,1,2,3 python tools/train_val.py --config configs/monoskd.yaml

⛄️ Pretrained Model

To ease the usage, we will provide the pre-trained model at Google Drive

We also provide the pre-trained Teacher model at Google Drive and pre-trained Student model at Google Drive

Considering that the trained model usually contains the weights of the teacher network, we use the script of tools/pth_transfer.py to delete the teacher network weights.

We provide the model reported in the paper and training logs for everyone to verify (mAP=20.21).

It is worth noting that drop_last = True during the training process, so the final inference result will have negligible accuracy error, which is reasonable.

Here we give the comparison.

<table align="center"> <tr> <td rowspan="2",div align="center">Models</td> <td colspan="3",div align="center">Car@BEV IoU=0.7</td> <td colspan="3",div align="center">Car@3D IoU=0.7</td> </tr> <tr> <td div align="center">Easy</td> <td div align="center">Mod</td> <td div align="center">Hard</td> <td div align="center">Easy</td> <td div align="center">Mod</td> <td div align="center">Hard</td> </tr> <tr> <td div align="center">original paper</td> <td div align="center">37.66</td> <td div align="center">26.41</td> <td div align="center">23.39</td> <td div align="center">28.91</td> <td div align="center">20.21</td> <td div align="center">16.99</td> </tr> <tr> <td div align="center">this repo</td> <td div align="center">37.66</td> <td div align="center">26.41</td> <td div align="center">23.39</td> <td div align="center">28.89</td> <td div align="center">20.19</td> <td div align="center">16.98</td> </tr> </table>

💚 Citation & Contact

If you have any questions, please contact me: buaa_wangsen@buaa.edu.cn

If you find our work helpful, you can cite our paper

@misc{wang2023monoskd,
      title={MonoSKD: General Distillation Framework for Monocular 3D Object Detection via Spearman Correlation Coefficient}, 
      author={Sen Wang and Jin Zheng},
      year={2023},
      eprint={2310.11316},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

🎵 Acknowledgements

This respository is mainly based on DID-M3D, and it also benefits from mmRazor. Thanks for their great works!

MonoSKD

Install / Use

README