ScMODAL
scMODAL: A general deep learning framework for single-cell Multi-Omics Data Alignment with feature Links
Install / Use
/learn @gefeiwang/ScMODALREADME
scMODAL
scMODAL: A general deep learning framework for single-cell Multi-Omics Data Alignment with feature Links
We introduce scMODAL, a deep learning framework tailored for single-cell multi-omics data alignment using feature links. scMODAL integrates datasets with limited known positively correlated features, leveraging neural networks and generative adversarial networks to align cell embeddings and preserve feature topology. Our experiments demonstrate scMODAL's effectiveness in removing unwanted variation, preserving biological information, and accurately identifying cell subpopulations across diverse datasets. scMODAL not only advances integration tasks but also supports downstream analyses such as feature imputation and inference of feature relationships, offering a robust solution for advancing single-cell multi-omics research.

Installation
scMODAL can be installed from from GitHub:
git clone https://github.com/gefeiwang/scMODAL.git
cd scMODAL
conda env update --f environment.yml
conda activate scmodal
Normally the installation time is less than 5 minutes.
Quick Start
Basic Usage
If the datasets are preprocessed as AnnData objects whose first n_shared columns contain linked features from different modalities, scMODAL can be ran using the following code:
import scmodal
model = scmodal.model.Model()
model.preprocess(adata1, adata2, shared_gene_num=n_shared)
model.train() # train the model
model.eval() # get integrated latent representation of cells
model.latent stores the integrated latent representation of cells, enabling downstream integrative analysis.
Alternatively, scMODAL also takes the inputs with linked features and all features in separate matrices. For example, three datasets can be integrated using
model.integrate_datasets_feats(input_feats=[adata1.X, adata2.X, adata3.X],
paired_input_MNN=[[X1_12, X2_12], [X2_23, X3_23]])
where [X1_12, X2_12] represents the pair of linked features between datasets 1 and 2, and [X2_23, X3_23] represents the pair of linked features between datasets 2 and 3.
Vignettes
We provide source codes for using scMODAL and reproducing the experiments. Please check the tutorial website for more details.
Note on find feature correspondence between protein and RNA modalities
Currently, we follow MaxFuse to establish feature correspondence between protein and RNA modalities with a given .csv file. Users can also retrieve the correspondence from online resources, such as mygene:
import mygene
mg = mygene.MyGeneInfo()
# List of cell surface proteins
proteins = ["Ki-67", "CD3", "CD4", "CD8", "CD38"]
# Query the database
result = mg.querymany(proteins, scopes="name", fields="symbol,name", species="human")
# Extract gene symbols
for entry in result:
print(f"Protein: {entry.get('name')}, Gene Symbol: {entry.get('symbol')}")
However, due to differences in nomenclature conversions, the output may be incomplete. Therefore, we recommend that users gather the aliases of proteins and gene symbols in order to refine the RNA-protein correspondences in the output.
Citation
Gefei Wang, Jia Zhao, Yingxin Lin, Tianyu Liu, Yize Zhao, Hongyu Zhao. scMODAL: A general deep learning framework for comprehensive single-cell multi-omics data alignment with feature links. Nature Communications 16, 4994 (2025).
Related Skills
YC-Killer
2.7kA library of enterprise-grade AI agents designed to democratize artificial intelligence and provide free, open-source alternatives to overvalued Y Combinator startups. If you are excited about democratizing AI access & AI agents, please star ⭐️ this repository and use the link in the readme to join our open source AI research team.
flutter-tutor
Flutter Learning Tutor Guide You are a friendly computer science tutor specializing in Flutter development. Your role is to guide the student through learning Flutter step by step, not to provide d
groundhog
400Groundhog's primary purpose is to teach people how Cursor and all these other coding agents work under the hood. If you understand how these coding assistants work from first principles, then you can drive these tools harder (or perhaps make your own!).
workshop-rules
Materials used to teach the summer camp <Data Science for Kids>
