I2MoE
[ICML 2025] I2MoE: Interpretable Multimodal Interaction-aware Mixture-of-Experts.
Install / Use
/learn @Raina-Xin/I2MoEREADME
I<sup>2</sup>MoE: Interpretable Multimodal Interaction-aware Mixture-of-Experts
Official implementation for "I<sup>2</sup>MoE: Interpretable Multimodal Interaction-aware Mixture-of-Experts" accepted by ICML 2025 (Poster).
- Authors: Jiayi Xin, Sukwon Yun, Jie Peng, Inyoung Choi, Jenna Ballard, Tianlong Chen and Qi Long
- Paper Link: https://arxiv.org/abs/2505.19190
Overview
Modality fusion is a cornerstone of multimodal learning, enabling information integration from diverse data sources. However, vanilla fusion methods are limited by (1) inability to account for heterogeneous interactions between modalities and (2) lack of interpretability in uncovering the multimodal interactions inherent in the data. To this end, we propose I²MoE (Interpretable Multimodal Interaction-aware Mixture of Experts), an end-to-end MoE framework designed to enhance modality fusion by explicitly modeling diverse multimodal interactions, as well as providing interpretation on a local and global level. First, I<sup>2</sup>MoE utilizes different interaction experts with weakly supervised interaction losses to learn multimodal interactions in a data-driven way. Second, I<sup>2</sup>MoE deploys a reweighting model that assigns importance scores for the output of each interaction expert, which offers sample-level and dataset-level interpretation. Extensive evaluation of medical and general multimodal datasets shows that I<sup>2</sup>MoE is flexible enough to be combined with different fusion techniques, consistently improves task performance, and provides interpretation across various real-world scenarios.
<img src="assets/i2moe.png" width="100%">Environment Setup
conda create -n i2moe python=3.10 -y
conda activate i2moe
pip install -r requirements.txt
Data Directory
Create data directory under ./data
Reproduce experimemnt results
adnidataset: follow Readme from Flex-MoE.mimicdataset: download MIMIC-IV v3.1 from PhysioNet and follow preprocessing procedure in Appendix E.4.mmimdb,mosi, andenricodatasets: download datasets following links in MultiBench.
Add new datasets
- Add preprocessing code of your new dataset under
src/common/datasets/<your_dataset>.py - If appropriate, add customized dataloader of your new dataset to
src/common/datasets/MultiModalDataset.py
elif dataset == "<your_dataset>":
# if False:
train_loader = DataLoader(
train_dataset,
batch_size=batch_size,
shuffle=True,
collate_fn=collate_fn_<your_dataset>,
num_workers=num_workers,
pin_memory=pin_memory,
)
val_loader = DataLoader(
valid_dataset,
batch_size=batch_size,
shuffle=False,
collate_fn=collate_fn_<your_dataset>,
num_workers=num_workers,
pin_memory=pin_memory,
)
test_loader = DataLoader(
test_dataset,
batch_size=batch_size,
shuffle=False,
collate_fn=collate_fn_<your_dataset>_test,
num_workers=num_workers,
pin_memory=pin_memory,
)
Train Models
Train I<sup>2</sup>MoE models
- Supported fusion methods:
<fusion>intransformer,interpretcc,moepp,switchgate. - Supported datasets:
<dataset>inadni,mimic,mmimdb,mosi_regression,enrico.
source scripts/train_scripts/imoe/<fusion>/run_<dataset>.sh
Train vanilla fusion models
- Supported fusion methods:
<fusion>intransformer,interpretcc,moepp,switchgateand other fusion (ef,lf,lrtf). - Supported datasets:
<dataset>inadni,mimic,mmimdb,mosi_regression,enrico.
# For <fusion> in ["transformer", "interpretcc", "moepp", "switchgate"]
source scripts/train_scripts/baseline/<fusion>/run_<dataset>.sh
# For <fusion> in ["ef", "lrtf", "lf"]
source scripts/train_scripts/baseline/other_fusion/run_<dataset>.sh
Ablations of I<sup>2</sup>MoE
- Supported datasets:
<dataset>inadni,mimic,mmimdb,mosi_regression,enrico.
source scripts/train_scripts/latent_contrastive/transformer/run_<dataset>.sh
source scripts/train_scripts/less_perturbed_forward/transformer/run_<dataset>.sh
source scripts/train_scripts/synergy_redundancy_only/transformer/run_<dataset>.sh
source scripts/train_scripts/simple_weighted_average/transformer/run_<dataset>.sh
source scripts/train_scripts/no_interaction_loss/transformer/run_<dataset>.sh
Citation
@article{xin2025i2moe,
title={I2MoE: Interpretable Multimodal Interaction-aware Mixture-of-Experts},
author={Xin, Jiayi and Yun, Sukwon and Peng, Jie and Choi, Inyoung and Ballard, Jenna L and Chen, Tianlong and Long, Qi},
journal={arXiv preprint arXiv:2505.19190},
year={2025}
}
Related Skills
node-connect
352.5kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
111.3kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
352.5kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
352.5kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
