DEIMv2
[DEIMv2] Real Time Object Detection Meets DINOv3
Install / Use
/learn @Intellindust-AI-Lab/DEIMv2README
🎉 We’re excited to introduce <a href="https://intellindust-ai-lab.github.io/projects/EdgeCrafter/">EdgeCrafter</a> with SOTA performance on object detection, pose estimation as well as instance segmentation.🎉
</h3> <p align="center"> <a href="https://github.com/Intellindust-AI-Lab/DEIMv2/blob/master/LICENSE"> <img alt="license" src="https://img.shields.io/badge/LICENSE-Apache%202.0-blue"> </a> <a href="https://arxiv.org/abs/2509.20787"> <img alt="arXiv" src="https://img.shields.io/badge/arXiv-2509.20787-red"> </a> <a href="https://intellindust-ai-lab.github.io/projects/DEIMv2/"> <img alt="project webpage" src="https://img.shields.io/badge/Webpage-DEIMv2-purple"> </a> <a href="https://github.com/Intellindust-AI-Lab/DEIMv2/pulls"> <img alt="prs" src="https://img.shields.io/github/issues-pr/Intellindust-AI-Lab/DEIMv2"> </a> <a href="https://github.com/Intellindust-AI-Lab/DEIMv2/issues"> <img alt="issues" src="https://img.shields.io/github/issues/Intellindust-AI-Lab/DEIMv2?color=olive"> </a> <a href="https://github.com/Intellindust-AI-Lab/DEIMv2"> <img alt="stars" src="https://img.shields.io/github/stars/Intellindust-AI-Lab/DEIMv2"> </a> <a href="mailto:shenxi@intellindust.com"> <img alt="Contact Us" src="https://img.shields.io/badge/Contact-Email-yellow"> </a> </p> <p align="center"> DEIMv2 is an evolution of the DEIM framework while leveraging the rich features from DINOv3. Our method is designed with various model sizes, from an ultra-light version up to S, M, L, and X, to be adaptable for a wide range of scenarios. Across these variants, DEIMv2 achieves state-of-the-art performance, with the S-sized model notably surpassing 50 AP on the challenging COCO benchmark. </p><div align="center"> <a href="http://www.shihuahuang.cn">Shihua Huang</a><sup>1*</sup>, Yongjie Hou<sup>1,2*</sup>, Longfei Liu<sup>1*</sup>, <a href="https://xuanlong-yu.github.io/">Xuanlong Yu</a><sup>1</sup>, <a href="https://xishen0220.github.io">Xi Shen</a><sup>1†</sup> </div> <p align="center"> <i> 1. <a href="https://intellindust-ai-lab.github.io"> Intellindust AI Lab</a> 2. Xiamen University <br> * Equal Contribution † Corresponding Author </i> </p> <p align="center"> <strong>If you like our work, please give us a ⭐!</strong> </p> <p align="center"> <img src="./figures/deimv2_coco_AP_vs_Params.png" alt="Image 1" width="49%"> <img src="./figures/deimv2_coco_AP_vs_GFLOPs.png" alt="Image 2" width="49%"> </p> </details>
🚀 Updates
- [x] [2026.3.20] 🔥🔥🔥Hi everyone! We’re excited to introduce EdgeCrafter, our latest work that achieves new state-of-the-art performance—faster, more accurate, and easier to use than ever. It also supports multiple vision tasks, including object detection, instance segmentation, and human pose estimation!
- [x] [2026.1.7] STA, introduced in DEIMv2, has been integrated into the SOTA distillation library LightlyTrain, demonstrating its practical value and impact in real-world training pipelines.
- [x] [2026.1.7] FP16 Inference Fix: Use TensorRT ≥ 10.6 to ensure stable execution and correct detection results. For detailed deployment instructions, please refer to Deployment.
- [x] [2025.11.3] We have uploaded our models to Hugging Face! Thanks to NielsRogge!
- [x] [2025.10.28] Optimized the attention module in ViT-Tiny, reducing memory usage by half for the S and M models.
- [x] [2025.10.2] DEIMv2 has been integrated into X-AnyLabeling! Many thanks to the X-AnyLabeling maintainers for making this possible.
- [x] [2025.9.26] Release DEIMv2 series.
🧭 Table of Content
- 1. 🤖 Model Zoo
- 2. ⚡ Quick Start
- 3. 🛠️ Usage
- 4. 🧰 Tools
- 5. 📜 Citation
- 6. 🙏 Acknowledgement
- 7. ⭐ Star History
1. Model Zoo
| Model | Dataset | AP | #Params | GFLOPs | Latency (ms) | config | Hugging Face | checkpoint | log | | :---: | :---: | :---: | :---: | :---: |:------------:| :---: | :---: | :---: | :---: | | Atto | COCO | 23.8 | 0.5M | 0.8 | 1.10 | yml | huggingface | Google / Quark | Google / Quark | | Femto | COCO | 31.0 | 1.0M | 1.7 | 1.45 | yml | huggingface | Google / Quark | Google / Quark | | Pico | COCO | 38.5 | 1.5M | 5.2 | 2.13 | yml | huggingface | Google / Quark | Google / Quark | | N | COCO | 43.0 | 3.6M | 6.8 | 2.32 | yml | huggingface | Google / Quark | Google / Quark | | S | COCO | 50.9 | 9.7M | 25.6 | 5.78 | yml | huggingface | Google / Quark | Google / Quark | | M | COCO | 53.0 | 18.1M | 52.2 | 8.80 | yml | huggingface | Google / Quark | Google / Quark | | L | COCO | 56.0 | 32.2M | 96.7 | 10.47 | yml | huggingface | Google / Quark | Google / Quark | | X | COCO | 57.8 | 50.3M | 151.6 | 13.75 | yml | huggingface | Google / Quark | Google / Quark |
2. Quick start
2.0 Using Models from Hugging Face
We currently release our models on Hugging Face! Here's a simple example. You can see detailed configs and more examples in hf_models.ipynb.
<details> <summary> Simple example </summary>Create a .py file in the directory of DEIMv2, make sure all components are loaded successfully.
import torch.nn as nn
from huggingface_hub import PyTorchModelHubMixin
from engine.backbone import HGNetv2, DINOv3STAs
from engine.deim import HybridEncoder, LiteEncoder
from engine.deim import DFINETransformer, DEIMTransformer
from engine.deim.postprocessor import PostProcessor
class DEIMv2(nn.Module, PyTorchModelHubMixin):
def __init__(self, config):
super().__init__()
self.backbone = DINOv3STAs(**config["DINOv3STAs"])
self.encoder = HybridEncoder(**config["HybridEncoder"])
self.decoder = DEIMTransformer(**config["DEIMTransformer"])
self.postprocessor = PostProcessor(**config["PostProcessor"])
def forward(self, x, orig_target_sizes):
x = self.backbone(x)
x = self.encoder(x)
x = self.decoder(x)
x = self.postprocessor(x, orig_target_sizes)
return x
deimv2_s_config = {
"DINOv3STAs": {
...
},
...
}
deimv2_s_hf = DEIMv2.from_pretrained("Intellindust/DEIMv2_DINOv3_S_COCO")
</details>
2.1 Environment Setup
# You can use PyTorch 2.5.1 or 2.4.1. We have not tried othe
