CMSSM
CM-SSM: Cross-modal State Space Model for Real-time RGB-Thermal Wild Scene Sementic Segmentation
Install / Use
/learn @xiaodonguo/CMSSMREADME
Cross-modal State Space Modeling and Terrain-specific Knowledge Distillation for RGB-Thermal Semantic Segmentation
Introduction
This repository contains the code for the paper "Cross-modal State Space Modeling for Real-time RGB-Thermal Wild Scene Semantic Segmentation," which has been accepted by IROS 2025.
✨2025-10-9✨ : An extended version of our conference paper, "Cross-modal State Space Modeling and Terrain-specific Knowledge Distillation for RGB-Thermal Semantic Segmentation", has been submitted to TASE. For the convenience of the review process, more details and codes are provided.
Method
The CM-SSM consists of two image encoders to extract the features of RGB and thermal images, four CM-SSA moudules to perform RGB-T feature fusion in four stages, and an MLP decoder to predict the semantic segmentation maps.
The CM-SS2D consists of three steps: 1) cross-modal selective scanning, 2) cross-modal state space modeling and 3) scan merging.
Reqiurements
Python==3.9
Pytorch==2.0.1
Cuda==11.8
mamba-ssm==1.0.1
causal-conv1d==1.0.0
mmcv==2.2.0
| Models |Backbone| Dataset | mIoU | Weights| |------|------|------------|------|--------------| | CM-SSM|EfficientVit-B1 | CART | 75.1 | pth | | CM-SSM|EfficeintVit-B1 | PST900 | 85.9 | pth | | CM-SSM|ConvNeXtV2-A | SUS | 82.5 | pth | | CM-SSM|ConvNeXtV2-A | FMB | 60.7 | pth |
Related Skills
node-connect
347.2kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
108.0kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
347.2kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
347.2kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
