CMSSM

CM-SSM: Cross-modal State Space Model for Real-time RGB-Thermal Wild Scene Sementic Segmentation

Generate Convert Improve

Install / Use

/learn @xiaodonguo/CMSSM

About this skill

Quality Score

0/100

README

Cross-modal State Space Modeling and Terrain-specific Knowledge Distillation for RGB-Thermal Semantic Segmentation

Introduction

This repository contains the code for the paper "Cross-modal State Space Modeling for Real-time RGB-Thermal Wild Scene Semantic Segmentation," which has been accepted by IROS 2025.

✨2025-10-9✨ : An extended version of our conference paper, "Cross-modal State Space Modeling and Terrain-specific Knowledge Distillation for RGB-Thermal Semantic Segmentation", has been submitted to TASE. For the convenience of the review process, more details and codes are provided.

Method

The CM-SSM consists of two image encoders to extract the features of RGB and thermal images, four CM-SSA moudules to perform RGB-T feature fusion in four stages, and an MLP decoder to predict the semantic segmentation maps.

The CM-SS2D consists of three steps: 1) cross-modal selective scanning, 2) cross-modal state space modeling and 3) scan merging.

Reqiurements

Python==3.9
Pytorch==2.0.1
Cuda==11.8
mamba-ssm==1.0.1
causal-conv1d==1.0.0
mmcv==2.2.0

| Models |Backbone| Dataset | mIoU | Weights| |------|------|------------|------|--------------| | CM-SSM|EfficientVit-B1 | CART | 75.1 | pth | | CM-SSM|EfficeintVit-B1 | PST900 | 85.9 | pth | | CM-SSM|ConvNeXtV2-A | SUS | 82.5 | pth | | CM-SSM|ConvNeXtV2-A | FMB | 60.7 | pth |

Related Skills

node-connect

347.2k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

108.0k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

347.2k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

qqbot-media

347.2k

QQBot 富媒体收发能力。使用 <qqmedia> 标签，系统根据文件扩展名自动识别类型（图片/语音/视频/文件）。