DINO <img src="figs/dinosaur.png" width="30">

This is the official implementation of the paper "DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection". (DINO pronounced `daɪnoʊ' as in dinosaur)

Authors: Hao Zhang*, Feng Li*, Shilong Liu*, Lei Zhang, Hang Su, Jun Zhu, Lionel M. Ni, Heung-Yeung Shum

News

[2023/7/10] We release Semantic-SAM, a universal image segmentation model to enable segment and recognize anything at any desired granularity. Code and checkpoint are available! </br> [2023/4/28]: We release a strong open-set object detection and segmentation model OpenSeeD that achieves the best results on open-set object segmentation tasks. Code and checkpoints are available here. </br> [2023/4/26]: DINO is shining again! We release Stable-DINO which is built upon DINO and FocalNet-Huge backbone that achieves 64.8 AP on COCO test-dev. </br> [2023/4/22]: With better hyper-params, our DINO-4scale model achieves 49.8 AP under 12ep settings, please check detrex: DINO for more details.</br> [2023/3/13]: We release a strong open-set object detection model Grounding DINO that achieves the best results on open-set object detection tasks. It achieves 52.5 zero-shot AP on COCO detection, without any COCO training data! It achieves 63.0 AP on COCO after fine-tuning. Code and checkpoints will be available here. </br> [2023/1/23]: DINO has been accepted to ICLR 2023!</br> [2022/12/02]: Code for Mask DINO is released (also in detrex)! Mask DINO further Achieves 51.7 and 59.0 box AP on COCO with a ResNet-50 and SwinL without extra detection data, outperforming DINO under the same setting!. </br> [2022/9/22]: We release a toolbox <font size=4>detrex</font> that provides state-of-the-art Transformer-based detection algorithms. It includes DINO with better performance. Welcome to use it! </br>

Supports Now: DETR, Deformble DETR, Conditional DETR, DAB-DETR, DN-DETR, DINO.

[2022/9/18]: We organize ECCV Workshop Computer Vision in the Wild (CVinW), where two challenges are hosted to evaluate the zero-shot, few-shot and full-shot performance of pre-trained vision models in downstream tasks:

``Image Classification in the Wild (ICinW)'' Challenge evaluates on 20 image classification tasks.
``Object Detection in the Wild (ODinW)'' Challenge evaluates on 35 object detection tasks.

<img src="https://computer-vision-in-the-wild.github.io/eccv-2022/static/eccv2022/img/ECCV-logo3.png" width=10%/> [Workshop] <img src="https://evalai.s3.amazonaws.com/media/logos/4e939412-a9c0-46bd-9797-5ba0bd0a9095.jpg" width=10%/> [IC Challenge] <img src="https://evalai.s3.amazonaws.com/media/logos/3a31ae6e-a990-48fb-b2c3-1e7da9d17a20.jpg" width=10%/> [OD Challenge] </br> [2022/8/6]: We update Swin-L model results without techniques such as O365 pre-training, large image size, and multi-scale test. We also upload the corresponding checkpoints to Google Drive. Our 5-scale model without any tricks obtains 58.5 AP on COCO val.</br> [2022/7/14]: We release the code with Swin-L and Convnext backbone. </br> [2022/7/10]: We release the code and checkpoints with Resnet-50 backbone. </br> [2022/6/7]: We release a unified detection and segmentation model Mask DINO that achieves the best results on all the three segmentation tasks (54.7 AP on COCO instance leaderboard, 59.5 PQ on COCO panoptic leaderboard, and 60.8 mIoU on ADE20K semantic leaderboard)! Code will be available here. </br> [2022/5/28] Code for DN-DETR is available here. </br> [2020/4/10]: Code for DAB-DETR is avaliable here. </br> [2022/3/8]: We reach the SOTA on MS-COCO leader board with 63.3AP! </br> [2022/3/9]: We build a repo Awesome Detection Transformer to present papers about transformer for detection and segmenttion. Welcome to your attention!

SOTA results

Introduction

We present DINO (DETR with Improved deNoising anchOr boxes) with:

State-of-the-art & end-to-end: DINO achieves 63.2 AP on COCO Val and 63.3 AP on COCO test-dev with more than ten times smaller model size and data size than previous best models.
Fast-converging: With the ResNet-50 backbone, DINO with 5 scales achieves 49.4 AP in 12 epochs and 51.3 AP in 24 epochs. Our 4-scale model achieves similar performance and runs at 23 FPS.

Methods

method

Model Zoo

We have put our model checkpoints here [model zoo in Google Drive][model zoo in 百度网盘]（提取码"DINO"), where checkpoint{x}_{y}scale.pth denotes the checkpoint of y-scale model trained for x epochs. Our training logs are in [Google Drive].

12 epoch setting

<table> <thead> <tr style="text-align: right;"> <th></th> <th>name</th> <th>backbone</th> <th>box AP</th> <th>Checkpoint</th> <th>Where in <a href="https://arxiv.org/abs/2203.03605">Our Paper</a></th> </tr> </thead> <tbody> <tr> <th>1</th> <td>DINO-4scale</td> <td>R50</td> <td>49.0</td> <td><a href="https://drive.google.com/drive/folders/1qD5m1NmK0kjE5hh-G17XUX751WsEG-h_?usp=sharing">Google Drive</a>&nbsp/&nbsp<a href="https://pan.baidu.com/s/1St5rvfgfPwpnPuf_Oe6DpQ">BaiDu</a>&nbsp</td> <td>Table 1</td> </tr> <tr> <th>2</th> <td>DINO-5scale</td> <td>R50</td> <td>49.4</td> <td><a href="https://drive.google.com/drive/folders/1qD5m1NmK0kjE5hh-G17XUX751WsEG-h_?usp=sharing">Google Drive</a>&nbsp/&nbsp<a href="https://pan.baidu.com/s/1St5rvfgfPwpnPuf_Oe6DpQ">BaiDu</a> </td> <td>Table 1</td> </tr> <tr> <th>3</th> <td>DINO-4scale</td> <td>Swin-L</td> <td>56.8</td> <td><a href="https://drive.google.com/drive/folders/1qD5m1NmK0kjE5hh-G17XUX751WsEG-h_?usp=sharing">Google Drive</a>&nbsp</td> <td></td> </tr> <tr> <th>4</th> <td>DINO-5scale</td> <td>Swin-L</td> <td>57.3</td> <td><a href="https://drive.google.com/drive/folders/1qD5m1NmK0kjE5hh-G17XUX751WsEG-h_?usp=sharing">Google Drive</a>&nbsp</td> <td></td> </tr> </tbody> </table>

DINO

Install / Use

README