MaskDINO
[CVPR 2023] Official implementation of the paper "Mask DINO: Towards A Unified Transformer-based Framework for Object Detection and Segmentation"
Install / Use
/learn @IDEA-Research/MaskDINOREADME
Mask DINO <img src="figures/dinosaur.png" width="30">
Feng Li*, Hao Zhang*, Huaizhe Xu, Shilong Liu, Lei Zhang, Lionel M. Ni, and Heung-Yeung Shum.
This repository is the official implementation of the Mask DINO: Towards A Unified Transformer-based Framework for Object Detection and Segmentation (DINO pronounced `daɪnoʊ' as in dinosaur). Our code is based on detectron2. detrex version is opensource simultaneously.
:fire: We release a strong open-set object detection and segmentation model OpenSeeD based on MaskDINO that achieves the best results on open-set object segmentation tasks. Code and checkpoints are available here.
<details open> <summary> <font size=8><strong>News</strong></font> </summary>[2023/7] We release Semantic-SAM, a universal image segmentation model to enable segment and recognize anything at any desired granularity. Code and checkpoint are available!
[2023/2] Mask DINO has been accepted to CVPR 2023!
[2022/9] We release a toolbox detrex that provides state-of-the-art Transformer-based detection algorithms. It includes DINO with better performance and Mask DINO will also be released with detrex implementation. Welcome to use it! </br>
- Supports Now: DETR, Deformble DETR, Conditional DETR, Group-DETR, DAB-DETR, DN-DETR, DINO.
[2022/7] Code for DINO is available here!
[2022/3]We build a repo Awesome Detection Transformer to present papers about transformer for detection and segmentation. Welcome to your attention!
</details>Features
- A unified architecture for object detection, panoptic, instance and semantic segmentation.
- Achieve task and data cooperation between detection and segmentation.
- State-of-the-art performance under the same setting.
- Support major detection and segmentation datasets: COCO, ADE20K, Cityscapes.
Code Updates
-
[2022/12/02] Our code and checkpoints are available! Mask DINO further Achieves <strong>51.7</strong> and <strong>59.0</strong> box AP on COCO with a ResNet-50 and SwinL without extra detection data, outperforming DINO under the same setting!
-
[2022/6] We propose a unified detection and segmentation model Mask DINO that achieves the best results on all the three segmentation tasks (54.7 AP on COCO instance leaderboard, 59.5 PQ on COCO panoptic leaderboard, and 60.8 mIoU on ADE20K semantic leaderboard)!
-
[x] Release code and checkpoints
-
[ ] Release model conversion checkpointer from DINO to MaskDINO
-
[ ] Release GPU cluster submit scripts based on submitit for multi-node training
-
[ ] Release EMA training for large models
-
[ ] Release more large models
Installation
See installation instructions.
Getting Started
See Inference Demo with Pre-trained Model
See Results.
See Preparing Datasets for MaskDINO.
See Getting Started.
See More Usage.

Results
In this part, we present the clean models that do not use extra detection data or tricks.
COCO Instance Segmentation and Object Detection
we follow DINO to use hidden dimension 2048 in the encoder of feedforward by default. We also use the mask-enhanced
box initialization proposed in our paper in instance segmentation and detection. To better present our model, we also list the models trained with
hidden dimension 1024 (hid 1024) and not using mask-enhance initialization (no mask enhance) in this table.
