MultiBench: Multiscale Benchmarks for Multimodal Representation Learning

Contributors

Correspondence to:

Paul Pu Liang (pliang@cs.cmu.edu)
Yiwei Lyu (yiweilyu@umich.edu)
Xiang Fan (xiangfan@cmu.edu)
Zetian Wu (zwu49@jhu.edu)
Yun Cheng (yc6206@cs.princeton.edu)
Arav Agarwal (arava@andrew.cmu.edu)
Jason Wu (jsonwu@cmu.edu)
Leslie Chen (lesliechen1998@gmail.com)
Peter Wu (peterw1@cs.cmu.edu)
Michelle A. Lee (michellelee@cs.stanford.edu)
Yuke Zhu (yukez@cs.utexas.edu)
Ruslan Salakhutdinov (rsalakhu@cs.cmu.edu)
Louis-Philippe Morency (morency@cs.cmu.edu)

Paper

MultiZoo & MultiBench: A Standardized Toolkit for Multimodal Deep Learning<br> Paul Pu Liang, Yiwei Lyu, Xiang Fan, Arav Agarwal, Yun Cheng, Louis-Philippe Morency, Ruslan Salakhutdinov<br> JMLR 2022 Open Source Software.

MultiBench: Multiscale Benchmarks for Multimodal Representation Learning<br> Paul Pu Liang, Yiwei Lyu, Xiang Fan, Zetian Wu, Yun Cheng, Jason Wu, Leslie Chen, Peter Wu, Michelle A. Lee, Yuke Zhu, Ruslan Salakhutdinov, Louis-Philippe Morency<br> NeurIPS 2021 Datasets and Benchmarks Track.

If you find this repository useful, please cite our paper and corresponding software package:

@article{liang2023multizoo,
  title={MULTIZOO \& MULTIBENCH: A Standardized Toolkit for Multimodal Deep Learning},
  author={Liang, Paul Pu and Lyu, Yiwei and Fan, Xiang and Agarwal, Arav and Cheng, Yun and Morency, Louis-Philippe and Salakhutdinov, Ruslan},
  journal={Journal of Machine Learning Research},
  volume={24},
  pages={1--7},
  year={2023}
}

@inproceedings{liang2021multibench,
  title={MultiBench: Multiscale Benchmarks for Multimodal Representation Learning},
  author={Liang, Paul Pu and Lyu, Yiwei and Fan, Xiang and Wu, Zetian and Cheng, Yun and Wu, Jason and Chen, Leslie Yufan and Wu, Peter and Lee, Michelle A and Zhu, Yuke and others},
  booktitle={Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 1)},
  year={2021}
}

Overview

Learning multimodal representations involves integrating information from multiple heterogeneous sources of data. It is a challenging yet crucial area with numerous real-world applications in multimedia, affective computing, robotics, finance, human-computer interaction, and healthcare. Unfortunately, multimodal research has seen limited resources to study (1) generalization across domains and modalities, (2) complexity during training and inference, and (3) robustness to noisy and missing modalities.

In order to accelerate progress towards understudied modalities and tasks while ensuring real-world robustness, we release MultiBench, a systematic and unified large-scale benchmark for multimodal learning spanning 15 datasets, 10 modalities, 20 prediction tasks, and 6 research areas. MultiBench provides an automated end-to-end machine learning pipeline that simplifies and standardizes data loading, experimental setup, and model evaluation. To reflect real-world requirements, MultiBench is designed to holistically evaluate (1) performance across domains and modalities, (2) complexity during training and inference, and (3) robustness to noisy and missing modalities.

To accompany MultiBench, we also provide a standardized implementation of 20 core approaches in multimodal learning unifying innovations in fusion paradigms, optimization objectives, and training approaches which we call MultiZoo. MultiZoo implements these methods in a modular fashion to enable accessibility for new researchers, compositionality of approaches, and reproducibility of results.

Datasets currently supported

Affective computing: MUStARD, CMU-MOSI, UR-FUNNY, CMU-MOSEI
Healthcare: MIMIC
Robotics: MuJoCo Push, Vision & Touch
Finance: Stocks-food, Stocks-health, Stocks-tech
HCI: ENRICO
Multimedia: AV-MNIST, MM-IMDb, Kinetics-S, Kinetics-L
RTFM env

To add a new dataset:

Go to datasets/
Add a new folder if appropriate
Write a python file with a get_dataloader function that returns a tuple of 3 dataloaders (for train, valid, test data respectively) containing preprocessed data. Please following the existing examples (such as avmnist: datasets/avmnist/get_data.py)
Go to examples/ and write an example training python file following the existing examples
Check that calling the dataloader and running a simple training script works

Algorithms supported

See Appendix Section F for detailed descriptions of each part.

Unimodal models: MLP, GRU, LeNet, CNN, LSTM, Transformer, FCN, Random Forest, ResNet, etc... (see unimodals/)
Fusion paradigms: early/late fusion, NL-gate, tensor fusions, Multiplicative Interactions, Low-Rank Tensor Fusion, etc (see fusions/)
Optimization objectives: (default: CrossEntropyLoss for classification tasks, MSELoss for regression tasks), ELBO, Weighted Reconstruction Loss, CCA loss, Contrastive Loss, etc (see objective_functions/)
Training structures: Supervised Learning (which supports Early Fusion, Late Fusion, MVAE, MFM, etc), Gradient Blend, Architecture Search, etc (see training_structures/)

To add a new algorithm:

Figure out which subfolder to add it into:

unimodals/ : unimodal architectures
fusions/ : multimodal fusion architectures
objective_functions/ : objective functions in addition to supervised training loss (e.g., VAE loss, contrastive loss)
training_structures/ : training algorithms excluding objective functions (e.g., balancing generalization, architecture search outer RL loop)

see examples/ and write an example training python file following the existing examples
check that calling the added functions and running a simple training script works
Make sure your new modules are well documented by comments in its input and output format and shapes

Open call for research areas, datasets, tasks, algorithms, and evaluation

We welcome new contributions to MultiBench through new research areas, datasets, tasks, algorithms, and evaluation. Please refer to the sections above for instructions on adding new datasets and algorithms, and open a pull request if you would like to see a specific dataset or algorithm added. We plan to use MultiBench as a theme for future workshops, competitions, and academic courses - stay tuned for upcoming calls for participation!

Experiments

Affective Computing

We release the processed datasets: sarcasm, mosi, mosei, humor. The original datasets are also publicly available at MultimodalSDK for MOSI and MOSEI, MUsTARD and UR-Funny. You can obtain processed data with datasets/affect/get_data.py, note that sarcasm means MUsTARD and humor means UR-FUNNY.

There are several example scripts for running affect datasets under examples/affect/. For example, to run affect datasets with simple late fusion, fistly, you can use

traindata, validdata, test_robust = get_dataloader('/home/pliang/multibench/affect/pack/mosi/mosi_raw.pkl', data_type='mosi')

or if you don't want to use packed data, and expect data with the same max squence length, use max_pad and max_seq_len options, and remember to set is_packed=False in the train and test functions

traindata, validdata, testdata = get_dataloader('/home/pliang/multibench/affect/pack/mosi/mosi_raw.pkl', data_type='mosi', max_pad=True, max_seq_len=50)

then do

python3 examples/affect/affect_late_fusion.py

Healthcare

The MIMIC dataset has restricted access. To gain access to the preprocessed version of this dataset, please follow instructions here to gain the necessary credentials. Once you have the credentials, email yiweilyu@umich.edu with proof of your credentials and ask for the preprocessed 'im.pk' file.

After you have the 'im.pk' file, you can get the dataloaders of this dataset by calling the get_dataloader function in examples/mimic/get_data.py. The get_dataloader function takes 2 inputs: the first specifies which task you want to do (-1 means mortality task, 1 means icd9 10-19 task, 7 means ic9 70-79 task). The input modalities will be static (vector of size 5) and time-series (24x30 shaped).

There are several example scripts for running MIMIC under examples/healthcare/. For example, to run MIMIC

MultiBench

Install / Use

README