LargeKernel3D
LargeKernel3D: Scaling up Kernels in 3D Sparse CNNs (CVPR 2023)
Install / Use
/learn @JIA-Lab-research/LargeKernel3DREADME
LargeKernel3D: Scaling up Kernels in 3D Sparse CNNs (CVPR 2023)
This is the implementation of LargeKernel3D (CVPR 2023). Large kernels are important but expensive in 3D CNNs. We propose spatial-wise partition to conv enable 3D large kernels. High performance on 3D semantic segmentation & object detection. For more details, please refer to:
LargeKernel3D: Scaling up Kernels in 3D Sparse CNNs [Paper] <br /> Yukang Chen, Jianhui Liu, Xiangyu Zhang, Xiaojuan Qi, Jiaya Jia<br />
<p align="center"> <img src="imgs/Large-small-kernels.png" width="100%"> </p>Experimental results
| nuScenes Object Detection | Set | mAP | NDS | Download | |------------------------------------------------------------------------------------------------------------------------------|:--------------:|:----:|:----:|:------------------------------:| | LargeKernel3D | val | 63.3 | 69.1 | Pre-trained | | LargeKernel3D | test | 65.4 | 70.6 | Pre-trained Submission | | +test aug | test | 68.7 | 72.8 | Submission | | LargeKernel3D-F | test | - | - | Pre-trained | | +test aug | test | 71.1 | 74.2 | Submission |
| ScanNetv2 Semantic Segmentation | Set | mIoU | Download | |-------------------------------------------------------------------------------------|:----:|:----:|:-----------------------------------------------------------------------------------------------------:| | LargeKernel3D | val | 73.5 | [Pre-trained] | | LargeKernel3D | test | 73.9 | [Submission] |
<p align="center"> <img src="imgs/ReceptiveFields.png" width="100%"> </p>Citation
If you find this project useful in your research, please consider citing:
@inproceedings{chen2023largekernel3d,
title={LargeKernel3D: Scaling up Kernels in 3D Sparse CNNs},
author={Yukang Chen and Jianhui Liu and Xiangyu Zhang and Xiaojuan Qi and Jiaya Jia},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
year={2023}
}
Acknowledgement
- This work is built upon FocalsConv for object detection.
- This work is built upon Stratified-Transformer for semantic segmentation.
Our Works in LiDAR-based 3D Computer Vision
- VoxelNeXt (CVPR 2023) [Paper] [Code] Fully Sparse VoxelNet for 3D Object Detection and Tracking.
- Focal Sparse Conv (CVPR 2022 Oral) [Paper] [Code] Dynamic sparse convolution for high performance.
- Spatial Pruned Conv (NeurIPS 2022) [Paper] [Code] 50% FLOPs saving for efficient 3D object detection.
- LargeKernel3D (CVPR 2023) [Paper] [Code] Large-kernel 3D sparse CNN backbone.
- SphereFormer (CVPR 2023) [Paper] [Code] Spherical window 3D transformer backbone.
- spconv-plus A library where we combine our works into spconv.
- SparseTransformer A library that includes high-efficiency transformer implementations for sparse point cloud or voxel data.
License
This project is released under the Apache 2.0 license.
Related Skills
node-connect
350.1kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
109.9kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
350.1kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
350.1kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
