ScMultiBench
Multi-task benchmarking of single-cell multimodal omics integration methods
Install / Use
/learn @PYangLab/ScMultiBenchREADME
Multi-task benchmarking of single-cell multimodal omics integration methods
Single-cell multimodal omics technologies have empowered the profiling of complex biological systems at a resolution and scale that were previously unattainable. These biotechnologies have propelled the fast-paced innovation and development of data integration methods, leading to a critical need for their systematic categorisation, evaluation, and benchmark. Navigating and selecting the most pertinent integration approach poses a significant challenge, contingent upon the tasks relevant to the study goals and the combination of modalities and batches present in the data at hand. Understanding how well each method performs multiple tasks, including dimension reduction, batch correction, cell type classification and clustering, imputation, feature selection, and spatial registration, and at which combinations will help guide this decision. This study aims to develop a much-needed guideline on choosing the most appropriate method for single-cell multimodal omics data analysis through a systematic categorisation and comprehensive benchmarking of current methods.
<img width=100% src="https://github.com/PYangLab/scMultiBench/blob/main/figure/figure1_v9.png"/>Integration Tools
In this benchmark, we evaluated 40 integration methods across the four data integration categories on 64 real datasets and 22 simulated datasets on a Ubuntu system with RTX3090 GPU. In particular, we include 18 vertical integration methods, 14 diagonal integration tools, 12 mosaic integration tools, and 15 cross integration tools. The installation environment is set up according to the respective tutorials. Tools that are compared include:
Vertical Integration (Dimension Reduction and Clustering):
- totalVI v1.1.2
- sciPENN v1.0.0
- Concerto Github Version: ab1fc7f
- scMSI Github Version: dffcbb2
- Matilda Github Version: 7d71480
- MOFA+ v1.6.0
- Multigrate v0.0.2
- UINMF v2.0.1
- scMoMaT v0.2.2
- Seurat_WNN v5.0.2
- scMM Github Version: c5c8579
- scMDC Github Version: 43b0c3a
- moETM Github Version: c2eaa97
- VIMCCA Github Version: 0.5.6
- iPOLNG v0.0.2
- MIRA v2.1.0
- UnitedNet Github Version: 3689da8
- scMVP Github Version: fc61e4d
Vertical Integration (Feature Selection):
Diagonal Integration (Dimension Reduction, Batch Correction, Clustering, Classification):
- scBridge Github Version: ff17561
- Portal v1.0.2
- SCALEX v1.0.2
- VIPCCA v0.2.7
- Seurat v3 v5.0.2
- MultiMAP Github Version: 681e608
- Seurat v5 v5.0.2
- sciCAN Github Version: ad71bba
- Conos v1.4.6
- iNMF v2.0.1
- online iNMF v2.0.1
- scJoint Github Version: cbbfa5d
- GLUE v0.3.2
- uniPort v1.2.2
Mosaic Integration (Dimension Reduction, Batch Correction, Clustering, Classification):
- MultiVI v1.1.2
- scMoMaT v0.2.2
- StabMap v0.1.8
- Cobolt v1.0.1
- UINMF v2.0.1
- Multigrate v0.0.2
- SMILE Github Version: a2e2ca6
Mosaic Integration (Imputation):
- scMM Github Version: c5c8579
- moETM Github Version: ad89fe2
- UnitedNet Github Version: 3689da8
- totalVI v1.1.2
- sciPENN v1.0.0
- StabMap v0.1.8
- MultiVI v1.1.2
Cross Integration (Dimension Reduction, Batch Correction, Clustering, Classification):
- totalVI v1.1.2
- scMoMaT v0.2.2
- UnitedNet Github Version: 3689da8
- sciPENN v1.0.0
- Concerto Github Version: ab1fc7f
- scMDC Github Version: 43b0c3a
- StabMap v0.1.8
- UINMF v2.0.1
- scMM Github Version: c5c8579
- MOFA+ v1.6.0
- Multigrate v0.0.2
Cross Integration (Spatial Registration):
- PASTE (both pairwise and centre versions) v1.4.0
- PASTE2 Gihub Version: b71ec88
- SPIRAL v1.0
- GPSA v0.8
Note that the installation time for tools may vary depending on the method used. For more detailed information, please refer to the original publication. For built-in classification, the classification scripts are provided in their corresponding method folders within the [tools_scripts] directory. For additional modules (such as kNN, SVM, random forest, and MLP), the scripts are provided in the [classification] directory.
Evaluation Pipeline
All evaluation pipelines can be found within the metrics folder. Example datasets are stored in the 'example_data' folder. For spatial registration data, users are required to download it from link, and then put it in the 'example_data/spatial/' folder.
Dataset
The processed datasets can be downloaded from link.
Shiny
Explore method performance in depth with our interactive Shiny, designed for dynamic visualization of benchmark results.
License
This project is covered under the Apache 2.0 License.
