U3M
[Pattern Recognition 2025 π]Unbiased Multiscale Modal Fusion Model for Multimodal Semantic Segmentation
Install / Use
/learn @LiBingyu01/U3MREADME
U3M: Unbiased Multiscale Modal Fusion Model for Multimodal Semantic Segmentation
</div> </div>π¬ Introduction
Multimodal semantic segmentation is a pivotal com-ponent of computer vision, which often outperforms unimodal methods by harnessing a richer set of information from diverse sources. Existing models often employ modality-specific designs that inherently introduce biases toward certain modalities. While these biases may be beneficial in specific contexts, they often compromise the modelβs adaptability across various multimodal scenarios, potentially degrading performance. To address this problem, we turn to the inherent capabilities of the model itself to discover the optimal equilibrium in multimodal fusion, and propose U3M: An Unbiased Multiscale Modal Fusion Model for Multimodal Semantic Segmentation. Specificially, this method involves an unbiased integration of multimodal visual data. Additionally, we employ feature fusion at multiple scales to ensure the effective extraction and integration of both global and local features. Experimental results demonstrate that our approach achieves superior performance across multiple datasets, validating its efficacy in enhancing the robustness and versatility of semantic segmentation in diverse settings.
π Updates
- [x] 05/2024: init repository and release the code.
- [x] 05/2024: release U3M model weights. Download from GoogleDrive.
ποΈ U3M model
<div align="center">
Figure: Overall architecture of U3M model.
π Environment
First, create and activate the environment using the following commands:
conda env create -f environment.yaml
conda activate U3M
π¦ Data preparation
Download the dataset:
- MCubeS, for multimodal material segmentation with RGB-A-D-N modalities.
- FMB, for FMB dataset with RGB-Infrared modalities.
Then, put the dataset under data directory as follows:
data/
βββ MCubeS
βΒ Β βββ polL_color
βΒ Β βββ polL_aolp_sin
βΒ Β βββ polL_aolp_cos
βΒ Β βββ polL_dolp
βΒ Β βββ NIR_warped
βΒ Β βββ NIR_warped_mask
βΒ Β βββ GT
βΒ Β βββ SSGT4MS
βΒ Β βββ list_folder
βΒ Β βββ SS
βββ FMB
βΒ Β βββ test
βΒ Β βΒ Β βββ color
βΒ Β βΒ Β βββ Infrared
βΒ Β βΒ Β βββ Label
βΒ Β βΒ Β βββ Visible
βΒ Β βββ train
βΒ Β βΒ Β βββ color
βΒ Β βΒ Β βββ Infrared
βΒ Β βΒ Β βββ Label
βΒ Β βΒ Β βββ Visible
π¦ Model Zoo
MCubeS
| Model-Modal | mIoU | weight | | :--------------- | :----- | :----- | | MCubeS-RGB | 49.22 | GoogleDrive | | MCubeS-RGB-A | 49.89 | GoogleDrive | | MCubeS-RGB-A-D | 50.26 | GoogleDrive | | MCubeS-RGB-A-D-N | 51.69 | GoogleDrive |
MCubeS_Ablation_RGBADN
| Model-Modal | mIoU | weight | | :--------------- | :----- | :----- | | with_Linear | 49.89 | GoogleDrive | | with_ChannelAntention | 50.34 | GoogleDrive | | with_PSPNet | 50.62 | GoogleDrive | | with_ALL | 51.69 | GoogleDrive |
FMB
| Model-Modal | mIoU | weight | | :--------------- | :----- | :----- | | FMB-RGB | 57.17 | GoogleDrive | | FMB-RGB-Infrared | 60.76 | GoogleDrive |
ποΈ Visulization
Please refer to the visulization_mm.py for the segmentation results and visulization_tsne.py for T-sne visulization.
<div align="center">
Figure: segresult_FMB.
Figure: segresult_MCUBES.
Figure: T-sne visulization.
Before training, please download pre-trained SegFormer, and put it in the correct directory following this structure:
checkpoints/pretrained/segformer
βββ mit_b0.pth
βββ mit_b1.pth
βββ mit_b2.pth
βββ mit_b3.pth
βββ mit_b4.pth
To train U3M model, please update the appropriate configuration file in configs/ with appropriate paths and hyper-parameters. Then run as follows:
cd path/to/U3M
conda activate U3M
python -m tools.train_mm --cfg configs/mcubes_rgbadn.yaml
python -m tools.train_mm --cfg configs/fmb_rgbt.yaml
Evaluation
To evaluate U3M models, please download respective model weights (GoogleDrive) and save them under any folder you like.
<!-- ```text output/ βββ MCubeS βΒ Β βββ U3M_B4_MCubeS_RGB.pth βΒ Β βββ U3M_B4_MCubeS_RGBA.pth βΒ Β βββ U3M_B4_MCubeS_RGBAD.pth βΒ Β βββ U3M_B4_MCubeS_RGBNAD.pth ``` -->Then, update the EVAL section of the appropriate configuration file in configs/ and run:
cd path/to/U3M
conda activate U3M
python -m tools.val_mm --cfg configs/mcubes_rgbadn.yaml
python -m tools.val_mm --cfg configs/fmb_rgbt.yaml
π© License
This repository is under the Apache-2.0 license. For commercial use, please contact with the authors.
π Citations
If you use U3M model, please cite the following work:
- U3M [arXiv]
@article{li2024u3m,
title={U3M: Unbiased Multiscale Modal Fusion Model for Multimodal Semantic Segmentation},
author={Li, Bingyu and Zhang, Da and Zhao, Zhiyuan and Gao, Junyu and Li, Xuelong},
journal={arXiv preprint arXiv:2405.15365},
year={2024}
}
π Acknowledgements
Our codebase is based on the following Github repositories. Thanks to the following public repositories:
Note: This is a research level repository and might contain issues/bugs. Please contact the authors for any query.
