I2MoE

[ICML 2025] I2MoE: Interpretable Multimodal Interaction-aware Mixture-of-Experts.

Generate Convert Improve

Install / Use

/learn @Raina-Xin/I2MoE

About this skill

Quality Score

0/100

README

I2MoE: Interpretable Multimodal Interaction-aware Mixture-of-Experts

Official implementation for "I2MoE: Interpretable Multimodal Interaction-aware Mixture-of-Experts" accepted by ICML 2025 (Poster).

Authors: Jiayi Xin, Sukwon Yun, Jie Peng, Inyoung Choi, Jenna Ballard, Tianlong Chen and Qi Long
Paper Link: https://arxiv.org/abs/2505.19190

Overview

Modality fusion is a cornerstone of multimodal learning, enabling information integration from diverse data sources. However, vanilla fusion methods are limited by (1) inability to account for heterogeneous interactions between modalities and (2) lack of interpretability in uncovering the multimodal interactions inherent in the data. To this end, we propose I²MoE (Interpretable Multimodal Interaction-aware Mixture of Experts), an end-to-end MoE framework designed to enhance modality fusion by explicitly modeling diverse multimodal interactions, as well as providing interpretation on a local and global level. First, I2MoE utilizes different interaction experts with weakly supervised interaction losses to learn multimodal interactions in a data-driven way. Second, I2MoE deploys a reweighting model that assigns importance scores for the output of each interaction expert, which offers sample-level and dataset-level interpretation. Extensive evaluation of medical and general multimodal datasets shows that I2MoE is flexible enough to be combined with different fusion techniques, consistently improves task performance, and provides interpretation across various real-world scenarios.

Environment Setup

conda create -n i2moe python=3.10 -y
conda activate i2moe
pip install -r requirements.txt

Data Directory

Create data directory under ./data

Reproduce experimemnt results

adni dataset: follow Readme from Flex-MoE.
mimic dataset: download MIMIC-IV v3.1 from PhysioNet and follow preprocessing procedure in Appendix E.4.
mmimdb, mosi, and enrico datasets: download datasets following links in MultiBench.

Add new datasets

Add preprocessing code of your new dataset under src/common/datasets/<your_dataset>.py
If appropriate, add customized dataloader of your new dataset to src/common/datasets/MultiModalDataset.py

    elif dataset == "<your_dataset>":
        # if False:
        train_loader = DataLoader(
            train_dataset,
            batch_size=batch_size,
            shuffle=True,
            collate_fn=collate_fn_<your_dataset>,
            num_workers=num_workers,
            pin_memory=pin_memory,
        )
        val_loader = DataLoader(
            valid_dataset,
            batch_size=batch_size,
            shuffle=False,
            collate_fn=collate_fn_<your_dataset>,
            num_workers=num_workers,
            pin_memory=pin_memory,
        )
        test_loader = DataLoader(
            test_dataset,
            batch_size=batch_size,
            shuffle=False,
            collate_fn=collate_fn_<your_dataset>_test,
            num_workers=num_workers,
            pin_memory=pin_memory,
        )

Train Models

Train I2MoE models

Supported fusion methods: <fusion> in transformer, interpretcc, moepp, switchgate.
Supported datasets:<dataset> in adni, mimic, mmimdb, mosi_regression, enrico.

source scripts/train_scripts/imoe/<fusion>/run_<dataset>.sh

Train vanilla fusion models

Supported fusion methods: <fusion> in transformer, interpretcc, moepp, switchgate and other fusion (ef, lf, lrtf).
Supported datasets:<dataset> in adni, mimic, mmimdb, mosi_regression, enrico.

# For <fusion> in ["transformer", "interpretcc", "moepp", "switchgate"]
source scripts/train_scripts/baseline/<fusion>/run_<dataset>.sh
# For <fusion> in ["ef", "lrtf", "lf"]
source scripts/train_scripts/baseline/other_fusion/run_<dataset>.sh

Ablations of I2MoE

Supported datasets:<dataset> in adni, mimic, mmimdb, mosi_regression, enrico.

source scripts/train_scripts/latent_contrastive/transformer/run_<dataset>.sh
source scripts/train_scripts/less_perturbed_forward/transformer/run_<dataset>.sh
source scripts/train_scripts/synergy_redundancy_only/transformer/run_<dataset>.sh
source scripts/train_scripts/simple_weighted_average/transformer/run_<dataset>.sh
source scripts/train_scripts/no_interaction_loss/transformer/run_<dataset>.sh

Citation

@article{xin2025i2moe,
  title={I2MoE: Interpretable Multimodal Interaction-aware Mixture-of-Experts},
  author={Xin, Jiayi and Yun, Sukwon and Peng, Jie and Choi, Inyoung and Ballard, Jenna L and Chen, Tianlong and Long, Qi},
  journal={arXiv preprint arXiv:2505.19190},
  year={2025}
}

Related Skills

node-connect

352.5k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

111.3k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

352.5k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

qqbot-media

352.5k

QQBot 富媒体收发能力。使用 <qqmedia> 标签，系统根据文件扩展名自动识别类型（图片/语音/视频/文件）。

Raina-Xin

View profile

View on GitHub

GitHub Stars62

CategoryDevelopment

Updated14d ago

Forks12

Raina-Xin/I2MoE

Languages

Python

Security Score

80/100

Audited on Mar 25, 2026

No findings

I2MoE

Install / Use

README

I<sup>2</sup>MoE: Interpretable Multimodal Interaction-aware Mixture-of-Experts

Overview

Environment Setup

Data Directory

Reproduce experimemnt results

Add new datasets

Train Models

Train I<sup>2</sup>MoE models

Train vanilla fusion models

Ablations of I<sup>2</sup>MoE

Citation

Related Skills