UNINEXT

[CVPR'23] Universal Instance Perception as Object Discovery and Retrieval

Generate Convert Improve

Install / Use

/learn @MasterBin-IIAU/UNINEXT

About this skill

Quality Score

0/100

README

Universal Instance Perception as Object Discovery and Retrieval

UNINEXT This is the official implementation of the paper Universal Instance Perception as Object Discovery and Retrieval.

News

:trophy: We are the runner-up in Segmentation in the Wild challenge.
:trophy: We are the winner of BDD100K MOT Challenge and the runner-up of BDD MOTS Challenge on CVPR2023 workshop.

Highlight

UNINEXT is accepted by CVPR2023.
UNINEXT reformulates diverse instance perception tasks into a unified object discovery and retrieval paradigm and can flexibly perceive different types of objects by simply changing the input prompts.
UNINEXT achieves superior performance on 20 challenging benchmarks using a single model with the same model parameters.

Introduction

TASK-RADAR

Object-centric understanding is one of the most essential and challenging problems in computer vision. In this work, we mainly discuss 10 sub-tasks, distributed on the vertices of the cube shown in the above figure. Since all these tasks aim to perceive instances of certain properties, UNINEXT reorganizes them into three types according to the different input prompts:

Category Names
- Object Detection
- Instance Segmentation
- Multiple Object Tracking (MOT)
- Multi-Object Tracking and Segmentation (MOTS)
- Video Instance Segmentation (VIS)
Language Expressions
- Referring Expression Comprehension (REC)
- Referring Expression Segmentation (RES)
- Referring Video Object Segmentation (R-VOS)
Target Annotations
- Single Object Tracking (SOT)
- Video Object Segmentation (VOS)

Then we propose a unified prompt-guided object discovery and retrieval formulation to solve all the above tasks. Extensive experiments demonstrate that UNINEXT achieves superior performance on 20 challenging benchmarks.

Demo

https://user-images.githubusercontent.com/40926230/224527028-f31e8de0-b8aa-4cfb-a83b-63a70ff5bd52.mp4

UNINEXT can flexibly perceive various types of objects by simply changing the input prompts, such as category names, language expressions, and target annotations. We also provide a simple demo script, which supports 4 image-level tasks (object detection, instance segmentation, REC, RES).

Results

Retrieval by Category Names

OD-IS MOT-MOTS-VIS

Retrieval by Language Expressions

REC-RES-RVOS

Retrieval by Target Annotations

SOT-VOS

Getting started

Installation: Please refer to INSTALL.md for more details.
Data preparation: Please refer to DATA.md for more details.
Training: Please refer to TRAIN.md for more details.
Testing: Please refer to TEST.md for more details.
Model zoo: Please refer to MODEL_ZOO.md for more details.

Citing UNINEXT

If you find UNINEXT useful in your research, please consider citing:

@inproceedings{UNINEXT,
  title={Universal Instance Perception as Object Discovery and Retrieval},
  author={Yan, Bin and Jiang, Yi and Wu, Jiannan and Wang, Dong and Yuan, Zehuan and Luo, Ping and Lu, Huchuan},
  booktitle={CVPR},
  year={2023}
}

Acknowledgments

Thanks Unicorn for providing experience of unifying four object tracking tasks (SOT, MOT, VOS, MOTS).
Thanks VNext for providing experience of Video Instance Segmentation (VIS).
Thanks ReferFormer for providing experience of REC, RES, and R-VOS.
Thanks GLIP for the idea of unifying object detection and phrase grounding.
Thanks Detic for the implementation of multi-dataset training.
Thanks detrex for the implementation of denoising mechnism.

Related Skills

docs-writer

98.8k

`docs-writer` skill instructions As an expert technical writer and editor for the Gemini CLI project, you produce accurate, clear, and consistent documentation. When asked to write, edit, or revie

model-usage

331.7k

Use CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.

Design

Campus Second-Hand Trading Platform \- General Design Document (v5.0 \- React Architecture \- Complete Final Version)1\. System Overall Design 1.1. Project Overview This project aims t

arscontexta

2.8k

Claude Code plugin that generates individualized knowledge systems from conversation. You describe how you think and work, have a conversation and get a complete second brain as markdown files you own.

MasterBin-IIAU

View profile

View on GitHub

GitHub Stars1.3k

CategoryContent

Updated11d ago

Forks122

MasterBin-IIAU/UNINEXT

Languages

Python

Security Score

100/100

Audited on Mar 11, 2026

No findings