<h2 align="center"> Real-Time Object Detection Meets DINOv3 </h2> <h3 align="center">

🎉 We’re excited to introduce <a href="https://intellindust-ai-lab.github.io/projects/EdgeCrafter/">EdgeCrafter</a> with SOTA performance on object detection, pose estimation as well as instance segmentation.🎉

</h3> <p align="center"> <a href="https://github.com/Intellindust-AI-Lab/DEIMv2/blob/master/LICENSE"> <img alt="license" src="https://img.shields.io/badge/LICENSE-Apache%202.0-blue"> </a> <a href="https://arxiv.org/abs/2509.20787"> <img alt="arXiv" src="https://img.shields.io/badge/arXiv-2509.20787-red"> </a> <a href="https://intellindust-ai-lab.github.io/projects/DEIMv2/"> <img alt="project webpage" src="https://img.shields.io/badge/Webpage-DEIMv2-purple"> </a> <a href="https://github.com/Intellindust-AI-Lab/DEIMv2/pulls"> <img alt="prs" src="https://img.shields.io/github/issues-pr/Intellindust-AI-Lab/DEIMv2"> </a> <a href="https://github.com/Intellindust-AI-Lab/DEIMv2/issues"> <img alt="issues" src="https://img.shields.io/github/issues/Intellindust-AI-Lab/DEIMv2?color=olive"> </a> <a href="https://github.com/Intellindust-AI-Lab/DEIMv2"> <img alt="stars" src="https://img.shields.io/github/stars/Intellindust-AI-Lab/DEIMv2"> </a> <a href="mailto:shenxi@intellindust.com"> <img alt="Contact Us" src="https://img.shields.io/badge/Contact-Email-yellow"> </a> </p> <p align="center"> DEIMv2 is an evolution of the DEIM framework while leveraging the rich features from DINOv3. Our method is designed with various model sizes, from an ultra-light version up to S, M, L, and X, to be adaptable for a wide range of scenarios. Across these variants, DEIMv2 achieves state-of-the-art performance, with the S-sized model notably surpassing 50 AP on the challenging COCO benchmark. </p>

<div align="center"> <a href="http://www.shihuahuang.cn">Shihua Huang</a><sup>1*</sup>,   Yongjie Hou<sup>1,2*</sup>,   Longfei Liu<sup>1*</sup>,   <a href="https://xuanlong-yu.github.io/">Xuanlong Yu</a><sup>1</sup>,   <a href="https://xishen0220.github.io">Xi Shen</a><sup>1†</sup>   </div> <p align="center"> <i> 1. <a href="https://intellindust-ai-lab.github.io"> Intellindust AI Lab</a>    2. Xiamen University   <br> * Equal Contribution    † Corresponding Author </i> </p> <p align="center"> <strong>If you like our work, please give us a ⭐!</strong> </p> <p align="center"> <img src="./figures/deimv2_coco_AP_vs_Params.png" alt="Image 1" width="49%"> <img src="./figures/deimv2_coco_AP_vs_GFLOPs.png" alt="Image 2" width="49%"> </p> </details>

🚀 Updates

[x] [2026.3.20] 🔥🔥🔥Hi everyone! We’re excited to introduce EdgeCrafter, our latest work that achieves new state-of-the-art performance—faster, more accurate, and easier to use than ever. It also supports multiple vision tasks, including object detection, instance segmentation, and human pose estimation!
[x] [2026.1.7] STA, introduced in DEIMv2, has been integrated into the SOTA distillation library LightlyTrain, demonstrating its practical value and impact in real-world training pipelines.
[x] [2026.1.7] FP16 Inference Fix: Use TensorRT ≥ 10.6 to ensure stable execution and correct detection results. For detailed deployment instructions, please refer to Deployment.
[x] [2025.11.3] We have uploaded our models to Hugging Face! Thanks to NielsRogge!
[x] [2025.10.28] Optimized the attention module in ViT-Tiny, reducing memory usage by half for the S and M models.
[x] [2025.10.2] DEIMv2 has been integrated into X-AnyLabeling! Many thanks to the X-AnyLabeling maintainers for making this possible.
[x] [2025.9.26] Release DEIMv2 series.

1. Model Zoo

| Model | Dataset | AP | #Params | GFLOPs | Latency (ms) | config | Hugging Face | checkpoint | log | | :---: | :---: | :---: | :---: | :---: |:------------:| :---: | :---: | :---: | :---: | | Atto | COCO | 23.8 | 0.5M | 0.8 | 1.10 | yml | huggingface | Google / Quark | Google / Quark | | Femto | COCO | 31.0 | 1.0M | 1.7 | 1.45 | yml | huggingface | Google / Quark | Google / Quark | | Pico | COCO | 38.5 | 1.5M | 5.2 | 2.13 | yml | huggingface | Google / Quark | Google / Quark | | N | COCO | 43.0 | 3.6M | 6.8 | 2.32 | yml | huggingface | Google / Quark | Google / Quark | | S | COCO | 50.9 | 9.7M | 25.6 | 5.78 | yml | huggingface | Google / Quark | Google / Quark | | M | COCO | 53.0 | 18.1M | 52.2 | 8.80 | yml | huggingface | Google / Quark | Google / Quark | | L | COCO | 56.0 | 32.2M | 96.7 | 10.47 | yml | huggingface | Google / Quark | Google / Quark | | X | COCO | 57.8 | 50.3M | 151.6 | 13.75 | yml | huggingface | Google / Quark | Google / Quark |

2. Quick start

2.0 Using Models from Hugging Face

We currently release our models on Hugging Face! Here's a simple example. You can see detailed configs and more examples in hf_models.ipynb.

<details> <summary> Simple example </summary>

Create a .py file in the directory of DEIMv2, make sure all components are loaded successfully.

import torch.nn as nn
from huggingface_hub import PyTorchModelHubMixin

from engine.backbone import HGNetv2, DINOv3STAs
from engine.deim import HybridEncoder, LiteEncoder
from engine.deim import DFINETransformer, DEIMTransformer
from engine.deim.postprocessor import PostProcessor


class DEIMv2(nn.Module, PyTorchModelHubMixin):
    def __init__(self, config):
        super().__init__()
        self.backbone = DINOv3STAs(**config["DINOv3STAs"])
        self.encoder = HybridEncoder(**config["HybridEncoder"])
        self.decoder = DEIMTransformer(**config["DEIMTransformer"])
        self.postprocessor = PostProcessor(**config["PostProcessor"])

    def forward(self, x, orig_target_sizes):
        x = self.backbone(x)
        x = self.encoder(x)
        x = self.decoder(x)
        x = self.postprocessor(x, orig_target_sizes)

        return x

deimv2_s_config = {
  "DINOv3STAs": {
    ...
  },
  ...
}

deimv2_s_hf = DEIMv2.from_pretrained("Intellindust/DEIMv2_DINOv3_S_COCO")

</details>

2.1 Environment Setup

# You can use PyTorch 2.5.1 or 2.4.1. We have not tried othe

DEIMv2

Install / Use

README