SkillAgentSearch skills...

MaskDINO

[CVPR 2023] Official implementation of the paper "Mask DINO: Towards A Unified Transformer-based Framework for Object Detection and Segmentation"

Install / Use

/learn @IDEA-Research/MaskDINO
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

Mask DINO <img src="figures/dinosaur.png" width="30">

PWC PWC PWC PWC PWC

Feng Li*, Hao Zhang*, Huaizhe Xu, Shilong Liu, Lei Zhang, Lionel M. Ni, and Heung-Yeung Shum.

This repository is the official implementation of the Mask DINO: Towards A Unified Transformer-based Framework for Object Detection and Segmentation (DINO pronounced `daɪnoʊ' as in dinosaur). Our code is based on detectron2. detrex version is opensource simultaneously.

:fire: We release a strong open-set object detection and segmentation model OpenSeeD based on MaskDINO that achieves the best results on open-set object segmentation tasks. Code and checkpoints are available here.

<details open> <summary> <font size=8><strong>News</strong></font> </summary>

[2023/7] We release Semantic-SAM, a universal image segmentation model to enable segment and recognize anything at any desired granularity. Code and checkpoint are available!

[2023/2] Mask DINO has been accepted to CVPR 2023!

[2022/9] We release a toolbox detrex that provides state-of-the-art Transformer-based detection algorithms. It includes DINO with better performance and Mask DINO will also be released with detrex implementation. Welcome to use it! </br>

[2022/7] Code for DINO is available here!

[2022/3]We build a repo Awesome Detection Transformer to present papers about transformer for detection and segmentation. Welcome to your attention!

</details>

Features

  • A unified architecture for object detection, panoptic, instance and semantic segmentation.
  • Achieve task and data cooperation between detection and segmentation.
  • State-of-the-art performance under the same setting.
  • Support major detection and segmentation datasets: COCO, ADE20K, Cityscapes.

Code Updates

  • [2022/12/02] Our code and checkpoints are available! Mask DINO further Achieves <strong>51.7</strong> and <strong>59.0</strong> box AP on COCO with a ResNet-50 and SwinL without extra detection data, outperforming DINO under the same setting!

  • [2022/6] We propose a unified detection and segmentation model Mask DINO that achieves the best results on all the three segmentation tasks (54.7 AP on COCO instance leaderboard, 59.5 PQ on COCO panoptic leaderboard, and 60.8 mIoU on ADE20K semantic leaderboard)!

<details open> <summary> <font size=8><strong>Todo list</strong></font> </summary>
  • [x] Release code and checkpoints

  • [ ] Release model conversion checkpointer from DINO to MaskDINO

  • [ ] Release GPU cluster submit scripts based on submitit for multi-node training

  • [ ] Release EMA training for large models

  • [ ] Release more large models

</details>

Installation

See installation instructions.

Getting Started

See Inference Demo with Pre-trained Model

See Results.

See Preparing Datasets for MaskDINO.

See Getting Started.

See More Usage.

MaskDINO


Results

In this part, we present the clean models that do not use extra detection data or tricks.

COCO Instance Segmentation and Object Detection

we follow DINO to use hidden dimension 2048 in the encoder of feedforward by default. We also use the mask-enhanced box initialization proposed in our paper in instance segmentation and detection. To better present our model, we also list the models trained with hidden dimension 1024 (hid 1024) and not using mask-enhance initialization (no mask enhance) in this table.

<table><tbody> <!-- START TABLE --> <!-- TABLE HEADER --> <th valign="bottom">Name</th> <th valign="bottom">Backbone</th> <th valign="bottom">Epochs</th> <th valign="bottom">Mask AP</th> <th valign="bottom">Box AP</th> <th valign="bottom">Params</th> <th valign="bottom">GFlops</th> <th valign="bottom">download</th> <tr><td align="left">MaskDINO (hid 1024) | <a href="configs/coco/instance-segmentation/maskdino_R50_bs16_50ep_3s.yaml">config</a></td> <td align="center">R50</td> <td align="center">50</td> <td align="center">46.1</td> <td align="center">51.5</td> <td align="center">47M</td> <td align="center">226</td> <td align="center"><a href="https://github.com/IDEA-Research/detrex-storage/releases/download/maskdino-v0.1.0/maskdino_r50_50ep_300q_hid1024_3sd1_instance_maskenhanced_mask46.1ap_box51.5ap.pth">model</a></td> </tr> <tr><td align="left">MaskDINO | <a href="configs/coco/instance-segmentation/maskdino_R50_bs16_50ep_3s_dowsample1_2048.yaml">config</a></td> <td align="center">R50</td> <td align="center">50</td> <td align="center">46.3</td> <td align="center">51.7</td> <td align="center">52M</td> <td align="center">286</td> <td align="center"><a href="https://github.com/IDEA-Research/detrex-storage/releases/download/maskdino-v0.1.0/maskdino_r50_50ep_300q_hid2048_3sd1_instance_maskenhanced_mask46.3ap_box51.7ap.pth">model</a></td> </tr> <tr><td align="left">MaskDINO (no mask enhance) | <a href="configs/coco/instance-segmentation/swin/maskdino_R50_bs16_50ep_4s_dowsample1_2048.yaml">config</a></td> <td align="center">Swin-L (IN21k)</td> <td align="center">50</td> <td align="center">52.1</td> <td align="center">58.3</td> <td align="center">223</td> <td align="center">1326</td> <td align="center"><a href="https://github.com/IDEA-Research/detrex-storage/releases/download/maskdino-v0.1.0/maskdino_swinl_50ep_300q_hid2048_3sd1_instance_mask52.1ap_box58.3ap.pth">model</a></td> </tr> <tr><td align="left">MaskDINO | <a href="configs/coco/instance-segmentation/swin/maskdino_R50_bs16_50ep_4s_dowsample1_2048.yaml">config</a></td> <td align="center">Swin-L (IN21k)</td> <td align="center">50</td> <td align="center">52.3</td> <td align="center">59.0</td> <td align="center">223</td> <td align="center">1326</td> <td align="center"><a href="https://github.com/IDEA-Research/detrex-storage/releases/download/maskdino-v0.1.0/maskdino_swinl_50ep_300q_hid2048_3sd1_instance_maskenhanced_mask52.3ap_box59.0ap.pth">model</a></td> </tr> <tr><td align="left">MaskDINO+O365 data+1.2 x larger image</td> <td align="center">Swin-L (IN21k)</td> <td align="center">20</td> <td align="center">54.5</td> <td align="center">---</td> <td align="center">223</td> <td align="center">1326</td> <td align="center">To Release</td> </tr> </tbody></table>

COCO Panoptic Segmentation

<table><tbody> <!-- START TABLE --> <!-- TABLE HEADER --> <th valign="bottom">Name</th> <th valign="bottom">Backbone</th> <th valign="bottom">epochs</th> <th valign="bottom">PQ</th> <th valign="bottom">Mask AP</th> <th valign="bottom">Box AP</th> <th valign="bottom">mIoU</th> <th valign="bottom">download</th> <tr><td align="left">MaskDINO | <a href="configs/coco/panoptic-segmentation/maskdino_R50_bs16_50ep_3s_dowsample1_2048.yaml">config</a></td> <td align="center">R50</td> <td align="center">50</td> <td align="center">53.0</td> <td align="center">48.8</td> <td align="center">44.3</td> <td align="center">60.6</td> <td align="center"><a href="https://github.com/IDEA-Research/detrex-storage/releases/download/maskdino-v0.1.0/maskdino_r50_50ep_300q_hid2048_3sd1_panoptic_pq53.0.pth">model</a></td> <tr><td align="left">MaskD
View on GitHub
GitHub Stars1.5k
CategoryDevelopment
Updated8h ago
Forks154

Languages

Python

Security Score

100/100

Audited on Mar 31, 2026

No findings