Signal

Signal: Selective Interaction and Global-local Alignment for Multi-Modal Object Re-Identification

Generate Convert Improve

Install / Use

/learn @010129/Signal

About this skill

Quality Score

0/100

README

Signal: Selective Interaction and Global-local Alignment for Multi-Modal Object Re-Identification

News🍎

Our paper has been accepted by AAAI-2026🌹! Paper is here.

Environment🍊

Our env: python=3.10.13, cuda:11.8.

You can prepare according to the following steps:

conda create -n myenv python=3.10.13
conda activate myenv
cd {your path}
pip install -r requirements.txt

Datasets🍋

RGBNT201 GET
RGBNT100 GET
MSVR310 GET

Pretrained Model🍉

ViT-B-16 GET (code:52fu)

Training🍒

python train.py --config_file configs/RGBNT201/Signal.yml

Our Model🍇

Our model's pth files are here: | dataset | mAP | R-1 | pth | |-------|-------|-------|-------| | RGBNT201 | 80.3 | 85.2 | Signal_model.pth | | RGBNT100 | 86.3 | 97.6 | Signal_model.pth | | MSVR310 | 53.2 | 72.4 | Signal_model.pth |

Test🥝

python test.py --config_file configs/RGBNT201/Signal.yml

Introduction🧅️

To address multi-modal object ReID challenges, we propose Signal, a selective interaction and global-local alignment framework with three components:

Selective Interaction Module (SIM): Selects important patch tokens from multi-modal features via intra-modal and inter-modal token selection.
Global Alignment Module (GAM): Simultaneously aligns multi-modal features by minimizing 3D polyhedra volume in gramian space.
Local Alignment Module (LAM): Refines fine-grained alignment via deformable sampling, handling pixel-level misalignment.

Contributions🥬

We propose a novel selective interaction and global-local alignment framework named Signal for multi-modal object ReID, which effectively addresses the challenges of background interference and multi-modal misalignment.
We propose the Selective Interaction Module (SIM) to leverage inter-modal and intra-modal information for selecting important patch tokens, thereby mitigating background interference in multi-modal fusion.
We propose the Global Alignment Module (GAM) to simultaneously align multi-modal features through minimizing the volume of 3D polyhedra in the gramian space.
We propose the Local Alignment Module (LAM) to align local features in a shift-aware manner, effectively addressing pixel-level misalignment across modalities.
Extensive experiments on three multi-modal object ReID datasets validate the effectiveness of our method.

Overall Framework🍠

GAM

LAM

Results🥂

Performance on RGBNT201

Performance on RGBNT100&MSVR310

Token Visual

Offsets Visual

Notes 🍩

Thank you for your attention and interest!

Related Skills

node-connect

349.9k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

109.8k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

349.9k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

qqbot-media

349.9k

QQBot 富媒体收发能力。使用 <qqmedia> 标签，系统根据文件扩展名自动识别类型（图片/语音/视频/文件）。