SkillAgentSearch skills...

MonoSKD

[ECAI 2023] MonoSKD: General Distillation Framework for Monocular 3D Object Detection via Spearman Correlation Coefficient

Install / Use

/learn @Senwang98/MonoSKD
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

🚀MonoSKD

<p align="center"> <img src='img/MonoSKD.png' align="center" height="350px"> </p>

💥 Introduction

This is the PyTorch implementation of the paper MonoSKD: General Distillation Framework for Monocular 3D Object Detection via Spearman Correlation Coefficient , In ECAI'23, Sen Wang and Jin Zheng. [📕Paper]

🔥 Abstract

Monocular 3D object detection is an inherently ill-posed problem, as it is challenging to predict accurate 3D localization from a single image. Existing monocular 3D detection knowledge distillation methods usually project the LiDAR onto the image plane and train the teacher network accordingly. Transferring LiDAR-based model knowledge to RGB-based models is more complex, so a general distillation strategy is needed. To alleviate cross-modal problem, we propose MonoSKD, a novel Knowledge Distillation framework for Monocular 3D detection based on Spearman correlation coefficient, to learn the relative correlation between cross-modal features. Considering the large gap between these features, strict alignment of features may mislead the training, so we propose a looser Spearman loss. Furthermore, by selecting appropriate distillation locations and removing redundant modules, our scheme saves more GPU resources and trains faster than existing methods. Extensive experiments are performed to verify the effectiveness of our framework on the challenging KITTI 3D object detection benchmark. Our method achieves state-of-the-art performance until submission with no additional inference computational cost. Our code will be made public once accepted.

🍇 Updates

  • 2023/12/08 Release the checkpoint of teacher and student.
  • 2023/10/21 Release the checkpoint of our distilled DID-M3D.
  • 2023/07/22 Release the codes of our MonoSKD framework.

📙 Overview

🍰 Installation

Installation Steps

a. Clone this repository.

b. Install the dependent libraries as follows:

  • Install the dependent python libraries:

    pip install torch==1.12.0 torchvision==0.13.0 pyyaml scikit-image opencv-python numba tqdm torchsort
    
  • We test this repository on Nvidia 3090 GPUs and Ubuntu 18.04. You can also follow the install instructions in GUPNet (This respository is based on it) to perform experiments with lower PyTorch/GPU versions.

📍 Getting Started

Dataset Preparation

this repo
├── data
│   │── KITTI3D
|   │   │── training
|   │   │   ├──calib & label_2 & image_2 & depth_dense
|   │   │── testing
|   │   │   ├──calib & image_2
├── config
├── ...
  • You can also choose to link your KITTI dataset path by

    KITTI_DATA_PATH=~/data/kitti_object
    ln -s $KITTI_DATA_PATH ./data/KITTI3D
    
  • To ease the usage, the pre-generated dense depth files at: Google Drive

Training & Testing

Test and evaluate the pretrained models

CUDA_VISIBLE_DEVICES=0 python tools/train_val.py --config config/monoskd.yaml -e   

Train a model

CUDA_VISIBLE_DEVICES=0,1,2,3 python tools/train_val.py --config configs/monoskd.yaml

⛄️ Pretrained Model

To ease the usage, we will provide the pre-trained model at Google Drive

We also provide the pre-trained Teacher model at Google Drive and pre-trained Student model at Google Drive

Considering that the trained model usually contains the weights of the teacher network, we use the script of tools/pth_transfer.py to delete the teacher network weights.

We provide the model reported in the paper and training logs for everyone to verify (mAP=20.21).

It is worth noting that drop_last = True during the training process, so the final inference result will have negligible accuracy error, which is reasonable.

Here we give the comparison.

<table align="center"> <tr> <td rowspan="2",div align="center">Models</td> <td colspan="3",div align="center">Car@BEV IoU=0.7</td> <td colspan="3",div align="center">Car@3D IoU=0.7</td> </tr> <tr> <td div align="center">Easy</td> <td div align="center">Mod</td> <td div align="center">Hard</td> <td div align="center">Easy</td> <td div align="center">Mod</td> <td div align="center">Hard</td> </tr> <tr> <td div align="center">original paper</td> <td div align="center">37.66</td> <td div align="center">26.41</td> <td div align="center">23.39</td> <td div align="center">28.91</td> <td div align="center">20.21</td> <td div align="center">16.99</td> </tr> <tr> <td div align="center">this repo</td> <td div align="center">37.66</td> <td div align="center">26.41</td> <td div align="center">23.39</td> <td div align="center">28.89</td> <td div align="center">20.19</td> <td div align="center">16.98</td> </tr> </table>

💚 Citation & Contact

If you have any questions, please contact me: buaa_wangsen@buaa.edu.cn

If you find our work helpful, you can cite our paper

<!-- ``` @inproceedings{peng2022did, title={DID-M3D: Decoupling Instance Depth for Monocular 3D Object Detection}, author={Peng, Liang and Wu, Xiaopei and Yang, Zheng and Liu, Haifeng and Cai, Deng}, booktitle={European Conference on Computer Vision}, year={2023} } ``` -->
@misc{wang2023monoskd,
      title={MonoSKD: General Distillation Framework for Monocular 3D Object Detection via Spearman Correlation Coefficient}, 
      author={Sen Wang and Jin Zheng},
      year={2023},
      eprint={2310.11316},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

🎵 Acknowledgements

This respository is mainly based on DID-M3D, and it also benefits from mmRazor. Thanks for their great works!

View on GitHub
GitHub Stars32
CategoryDevelopment
Updated1mo ago
Forks4

Languages

Python

Security Score

90/100

Audited on Feb 23, 2026

No findings