DCNv4
[CVPR 2024] Deformable Convolution v4
Install / Use
/learn @OpenGVLab/DCNv4README
DCNv4
News
Jan 15, 2024: 🚀 Compared with InternImage, the new FlashInternImage powered with DCNv4 has faster inference speed, faster convergence, and better performance!!!Jan 15, 2024: 🚀 "DCNv4" is released!
Introduction
We introduce Deformable Convolution v4 (DCNv4), a highly efficient and effective operator designed for a broad spectrum of vision applications. DCNv4 addresses the limitations of its predecessor, DCNv3, with two key enhancements: 1. removing softmax normalization in spatial aggregation to enhance its dynamic property and expressive power and 2. optimizing memory access to minimize redundant operations for speedup. These improvements result in a significantly faster convergence compared to DCNv3 and a substantial increase in processing speed, with DCNv4 achieving more than three times the forward speed. DCNv4 demonstrates exceptional performance across various tasks, including image classification, instance and semantic segmentation, and notably, image generation. When integrated into generative models like U-Net in the latent diffusion model, DCNv4 outperforms its baseline, underscoring its possibility to enhance generative models. In practical applications, replacing DCNv3 with DCNv4 in the InternImage model to create FlashInternImage results in up to 80% speed increase and further performance improvement without further modifications. The advancements in speed and efficiency of DCNv4, combined with its robust performance across diverse vision tasks, show its potential as a foundational building block for future vision models.
Released Models
<details> <summary> ImageNet Image Classification </summary> <br> <div>| name | pretrain | resolution | acc@1 | #param | download | | :------------: | :----------: | :--------: | :---: | :----: | :---------------------------------------------------------------------------------------------------------------------------------------------------------------: | | FlashInternImage-T | ImageNet-1K | 224x224 | 83.6 | 30M | ckpt | cfg | | FlashInternImage-S | ImageNet-1K | 224x224 | 84.4 | 50M | ckpt | cfg | | FlashInternImage-B | ImageNet-1K | 224x224 | 84.9 | 97M | ckpt | cfg | | FlashInternImage-L | ImageNet-22K | 384x384 | 88.1 | 223M | ckpt | cfg |
</div> </details> <details> <summary> COCO Object Detection and Instance Segmentation </summary> <br> <div>| backbone |method | schd | box mAP | mask mAP |Config | Download | | :-----------------:| :----------: | :---------: | :-----: |:------: | :-----: | :---: | | FlashInternImage-T |Mask-RCNN| 1x | 48.0 | 43.1 | config | ckpt | log | | FlashInternImage-T |Mask-RCNN | 3x | 49.5 | 44.0 | config | ckpt | log | | FlashInternImage-S |Mask-RCNN| 1x | 49.2 | 44.0 | config | ckpt | log | | FlashInternImage-S |Mask-RCNN | 3x | 50.5 | 44.9 | config | ckpt | log | | FlashInternImage-B |Mask-RCNN | 1x | 50.1 | 44.5 | config | ckpt | log | | FlashInternImage-B |Mask-RCNN| 3x | 50.6 | 45.4 | config| ckpt | log |
| backbone | method| schd | box mAP | mask mAP | Config | Download | | :------------:| :---------: | :---------: | :-----: | :------: | :---: | :---: | | FlashInternImage-L |Cascade Mask R-CNN | 1x | 55.6 | 48.2 | config | ckpt | log | FlashInternImage-L |Cascade Mask R-CNN | 3x | 56.7 | 48.9 | config | ckpt |
| backbone |method | lr type | pretrain | schd | box mAP | Config | Download | | :------------: | :---------: | :---------: |:---------: | :---------: | :-----: | :---: | :-----: | | FlashInternImage-T |DINO| layer-wise lr | ImageNet-1K | 1x | 54.7 | config | ckpt | log | | FlashInternImage-S |DINO | layer-wise lr | ImageNet-1K | 1x | 55.3 | config | ckpt | log | | FlashInternImage-B |DINO| layer-wise lr | ImageNet-1K | 1x | 56.0 | config | ckpt | log | | FlashInternImage-L |DINO | 0.1x backbone lr | ImageNet-22K | 1x | 58.8 | config | ckpt | log |
</div> </details> <details> <summary> ADE20K Semantic Segmentation </summary> <br> <div>| backbone |method | resolution | mIoU (ss/ms) | Config | Download | |:--------------:|:----------:|:----------:|:-----------:|:-----------:|:----------: | FlashInternImage-T|UperNet | 512x512 | 49.3 / 50.3 | config | ckpt | log | | FlashInternImage-S |UperNet | 512x512 | 50.6 / 51.6 | config | ckpt | log | | FlashInternImage-B |UperNet | 512x512 | 52.0 / 52.6 | config | ckpt | log | | FlashInternImage-L |UperNet | 640x640 | 55.6 / 56.0 | config| ckpt | log |
| backbone |method | resolution | mIoU (ss) | Config | Downloa
