<h1 style="display: flex; align-items: center; font-size: 2em;"> <img src="figure/logo.png" alt="Logo" style="width: 50px; height: 50px; margin-right: 10px;"> scMultiBench </h1>

Multi-task benchmarking of single-cell multimodal omics integration methods

Single-cell multimodal omics technologies have empowered the profiling of complex biological systems at a resolution and scale that were previously unattainable. These biotechnologies have propelled the fast-paced innovation and development of data integration methods, leading to a critical need for their systematic categorisation, evaluation, and benchmark. Navigating and selecting the most pertinent integration approach poses a significant challenge, contingent upon the tasks relevant to the study goals and the combination of modalities and batches present in the data at hand. Understanding how well each method performs multiple tasks, including dimension reduction, batch correction, cell type classification and clustering, imputation, feature selection, and spatial registration, and at which combinations will help guide this decision. This study aims to develop a much-needed guideline on choosing the most appropriate method for single-cell multimodal omics data analysis through a systematic categorisation and comprehensive benchmarking of current methods.

Integration Tools

In this benchmark, we evaluated 40 integration methods across the four data integration categories on 64 real datasets and 22 simulated datasets on a Ubuntu system with RTX3090 GPU. In particular, we include 18 vertical integration methods, 14 diagonal integration tools, 12 mosaic integration tools, and 15 cross integration tools. The installation environment is set up according to the respective tutorials. Tools that are compared include:

Vertical Integration (Dimension Reduction and Clustering):

totalVI v1.1.2
sciPENN v1.0.0
Concerto Github Version: ab1fc7f
scMSI Github Version: dffcbb2
Matilda Github Version: 7d71480
MOFA+ v1.6.0
Multigrate v0.0.2
UINMF v2.0.1
scMoMaT v0.2.2
Seurat_WNN v5.0.2
scMM Github Version: c5c8579
scMDC Github Version: 43b0c3a
moETM Github Version: c2eaa97
VIMCCA Github Version: 0.5.6
iPOLNG v0.0.2
MIRA v2.1.0
UnitedNet Github Version: 3689da8
scMVP Github Version: fc61e4d

Vertical Integration (Feature Selection):

scMoMaT v0.2.2
Matilda Github Version: 7d71480
MOFA+ v1.6.0

Diagonal Integration (Dimension Reduction, Batch Correction, Clustering, Classification):

scBridge Github Version: ff17561
Portal v1.0.2
SCALEX v1.0.2
VIPCCA v0.2.7
Seurat v3 v5.0.2
MultiMAP Github Version: 681e608
Seurat v5 v5.0.2
sciCAN Github Version: ad71bba
Conos v1.4.6
iNMF v2.0.1
online iNMF v2.0.1
scJoint Github Version: cbbfa5d
GLUE v0.3.2
uniPort v1.2.2

Mosaic Integration (Dimension Reduction, Batch Correction, Clustering, Classification):

MultiVI v1.1.2
scMoMaT v0.2.2
StabMap v0.1.8
Cobolt v1.0.1
UINMF v2.0.1
Multigrate v0.0.2
SMILE Github Version: a2e2ca6

Mosaic Integration (Imputation):

scMM Github Version: c5c8579
moETM Github Version: ad89fe2
UnitedNet Github Version: 3689da8
totalVI v1.1.2
sciPENN v1.0.0
StabMap v0.1.8
MultiVI v1.1.2

Cross Integration (Dimension Reduction, Batch Correction, Clustering, Classification):

totalVI v1.1.2
scMoMaT v0.2.2
UnitedNet Github Version: 3689da8
sciPENN v1.0.0
Concerto Github Version: ab1fc7f
scMDC Github Version: 43b0c3a
StabMap v0.1.8
UINMF v2.0.1
scMM Github Version: c5c8579
MOFA+ v1.6.0
Multigrate v0.0.2

Cross Integration (Spatial Registration):

PASTE (both pairwise and centre versions) v1.4.0
PASTE2 Gihub Version: b71ec88
SPIRAL v1.0
GPSA v0.8

Note that the installation time for tools may vary depending on the method used. For more detailed information, please refer to the original publication. For built-in classification, the classification scripts are provided in their corresponding method folders within the [tools_scripts] directory. For additional modules (such as kNN, SVM, random forest, and MLP), the scripts are provided in the [classification] directory.

Evaluation Pipeline

All evaluation pipelines can be found within the metrics folder. Example datasets are stored in the 'example_data' folder. For spatial registration data, users are required to download it from link, and then put it in the 'example_data/spatial/' folder.

Dataset

The processed datasets can be downloaded from link.

Shiny

Explore method performance in depth with our interactive Shiny, designed for dynamic visualization of benchmark results.

License

This project is covered under the Apache 2.0 License.

ScMultiBench

Install / Use

README

Integration Tools

Evaluation Pipeline

Dataset

Shiny

License