ScMTNI
No description available
Install / Use
/learn @Roy-lab/ScMTNIREADME
single-cell Multi-Task learning Network Inference (scMTNI)
We have developed single-cell Multi-Task learning Network Inference (scMTNI), a multi-task learning framework for joint inference of cell type-specific gene regulatory networks that leverages the cell lineage structure and scRNA-seq and scATAC-seq mea- surements to enable robust inference of cell type-specific gene regulatory networks. scMTNI takes as input a cell lineage tree, cell type-specific scRNA-seq data and optional cell type-specific prior networks that can be derived from bulk or single-cell ATAC-seq datasets.
The scMTNI model has the following benefits:
- 1 uses multi-task learning allowing the learning procedure to be informed by the shared infor- mation across cell types,
- 2 incorporates the lineage structure to influence the extent of sharing between the learned networks,
- 3 incorporates prior information, such as motif-based prior network derived from scATAC-seq data, thereby integrating scRNA-seq and scATAC-seq data to infer gene regulatory network dynamics across cell lineages.
Zhang, S., Pyne, S., Pietrzak, S. et al. Inference of cell type-specific gene regulatory networks on cell lineages from single cell omic datasets. Nat Commun 14, 3064 (2023). https://doi.org/10.1038/s41467-023-38637-9

Step 1. Install
The code is compiled and tested for Linux environment. GSL (GNU Scientific Library) is used to handle matrix-related and vector-related operations. It requires GCC version of gcc-6.3.1 and GNU extension with std=gnu++14 setting. The typical install time on a "normal" desktop computer is a few minutes.
- git clone https://github.com/Roy-lab/scMTNI.git
- cd scMTNI/Code/
- make
Step 2. Prepare input files
The data for demo is in ExampleData/. The demo data contains 100 regulators and 300 genes. All the files in ExampleData/ are subsamples of the original files, as a demo for the file format. The raw data is too large to upload. The source data is available at https://zenodo.org/record/7879228. Please contact the Roy Lab for raw data if needed.
2.1 integrating scRNA-seq and scATAC-seq using LIGER
Apply LIGER to integrate the scRNA-seq and scATAC-seq datasets, check LIGER (https://github.com/welch-lab/liger) for details. Input example files for scATAC-seq and scRNA-seq: ExampleData/LIGER/scATACseq.txt, ExampleData/LIGER/scRNAseq.txt
Rscript --vanilla Scripts/Integration/LIGER_scRNAseq_scATAC.R
The output files are in ExampleData/LIGER/. The liger cluster assginment is in ExampleData/LIGER/ligerclusters.txt
2.2 generating the prior network using scATAC-seq data and motifs
Check https://github.com/Roy-lab/scMTNI/blob/master/Scripts/genPriorNetwork/readme.md for details. Due to limitation of file size in Github, bam files are currently not provided in ExampleData/. For demo, please directly use the output prior networks ExampleData/cluster*_network.txt
bash Scripts/genPriorNetwork/genPriorNetwork_scMTNI.sh
The example output files are ExampleData/cluster*_network.txt
2.3 Prepare all input files and config file for scMTNI
First prepare filelist.txt
The first column is the cell name, the second column is the location and filename of the expression data for each cell type. The example file ExampleData/filelist.txt:
cluster3 ExampleData/cluster3.table
cluster2 ExampleData/cluster2.table
cluster1 ExampleData/cluster1.table
cluster6 ExampleData/cluster6.table
cluster9 ExampleData/cluster9.table
cluster10 ExampleData/cluster10.table
cluster7 ExampleData/cluster7.table
Then prepare all the other input files based on ExampleData/filelist.txt and regulators list ExampleData/regulators.txt
Prepare input files with prior network:
indir=ExampleData/
filelist=${indir}/filelist.txt
regfile=${indir}/regulators.txt
python Scripts/PreparescMTNIinputfiles.py --filelist $filelist --regfile $regfile --indir $indir --outdir Results --splitgene 50 --motifs 1
Prepare input files without prior network:
python Scripts/PreparescMTNIinputfiles.py --filelist $filelist --regfile $regfile --indir $indir --outdir Results --splitgene 50 --motifs 0
Prepare cell lineage tree:
The cell lineage tree file should have 5 columns describing the tree:
-
- Child cell
-
- Parent cell
-
- Branch-specific gain rate (The probability that an edge is gained in a child given that the edge is absent in the predecessor cell)
-
- Branch-specific loss rate (The probability that an edge is lost in a child given that the edge is present in the predecessor cell)
The example file for cell lineage tree ExampleData/celltype_tree_ancestor.txt
cluster2 cluster3 0.2 0.2
cluster1 cluster2 0.2 0.2
cluster6 cluster2 0.2 0.2
cluster9 cluster6 0.2 0.2
cluster10 cluster6 0.2 0.2
cluster7 cluster10 0.2 0.2
Step 3. Run
The input data for demo is in ExampleData/. The expected output is in Results/. The estimuated run time for the demo is around 7 minute. The output network for each cell type is Results/cluster*/fold0/var_mb_pw_k50.txt
Example usage of scMTNI with prior network
Code/scMTNI -f ExampleData/testdata_config.txt -x50 -l ExampleData/TFs_OGs.txt -n ExampleData/AllGenes.txt -d ExampleData/celltype_tree_ancestor.txt -m ExampleData/testdata_ogids.txt -s ExampleData/celltype_order.txt -p 0.2 -c yes -b -0.9 -q 2
The above example will run scMTNI using all regulators and targets.
Since scMTNI learns regulators on a per-target basis, the algorithm can easily be parallelized by running the algorithm for each target gene (or sets of genes) separately. For example, to run scMTNI using 10 genes, we can replace the -n parameter with a file that contains only 10 genes as in ExampleData/AllGenes0.txt:
Code/scMTNI -f ExampleData/testdata_config.txt -x50 -l ExampleData/TFs_OGs.txt -n ExampleData/AllGenes0.txt -d ExampleData/celltype_tree_ancestor.txt -m ExampleData/testdata_ogids.txt -s ExampleData/celltype_order.txt -p 0.2 -c yes -b -0.9 -q 2
Example usage of scMTNI without prior network
Code/scMTNI -f ExampleData/testdata_config_noprior.txt -x50 -v1 -l ExampleData/TFs_OGs.txt -n ExampleData/AllGenes.txt -d ExampleData/celltype_tree_ancestor.txt -m ExampleData/testdata_ogids.txt -s ExampleData/celltype_order.txt -p 0.2 -c yes -b -0.9 -q 0
Example usage of INDEP with prior network (INDEP: single cell cluster version of scMTNI)
Add parameter i and set it to yes for running INDEP. celltype_tree_ancestor.txt (parameter -d) file is not needed for INDEP
Code/scMTNI -f ExampleData/cluster1_config.txt -x50 -l ExampleData/cluster1_TFs_OGs.txt -n ExampleData/cluster1_AllGenes.txt -m ExampleData/cluster1_ogids.txt -s ExampleData/cluster1.txt -i yes -c yes -b -0.9 -q 2
Example usage of INDEP without prior network (INDEP: single cell cluster version of scMTNI)
Add parameter i and set it to yes for running INDEP. celltype_tree_ancestor.txt (parameter -d) file is not needed for INDEP
Code/scMTNI -f ExampleData/cluster1_config_noprior.txt -x50 -l ExampleData/cluster1_TFs_OGs.txt -n ExampleData/cluster1_AllGenes.txt -m ExampleData/cluster1_ogids.txt -s ExampleData/cluster1.txt -i yes -c yes -b -0.9 -q 0
Parameter Explanations
f : config file with six columns, rows for each cell. Each cell's row should have the following species-specific entries:
-
- Cell Name
-
- Location of expression data with file name (cell.table)
-
- Location to place outputs
-
- List of regulators to be used
-
- List of target genes to be used
-
- List of motifs to be used. This file should have three tab-separated columns, listing the regulator, target, and motif score
x : Maximum # of regulators to be used for a given target.
p : default 0.5. The probability that an edge is present in the root cell.
l : List of the orthogroups (id #s) to be considered as regulators. Note: a regulator must also be present in the species-specific list of regulators given in the species-specific config file (parameter f).
The list should only have the orthogroup IDs, not the names of the genes belonging to the orthogroup. The gene names are specified through parameter m which maps the orthogroup IDs to the gene names.
n : List of the orthogroups (id #s) to be considered as targets. Note: a target must also be present in the species-specific list of targets given in the species-specific config file (parameter f).
The list should only have the orthogroup IDs, not the names of the genes belonging to the orthogroup. The gene names are specified through parameter m which maps the orthogroup IDs to the gene names.
d : The cell lineage tree to be used. This file should have 5 columns describing the tree:
-
- Child cell
-
- Parent cell
-
- Branch-specific gain rate (The probability that an edge is gained in a child given that the edge is absent in the predecessor cell)
-
- Branch-specific loss rate (The probability that an edge is lost in a child given that the edge is present in the predecessor cell)
m : A file describing the gene relationships. The first column of this file is of the format OGID{NUMBER}_{DUP}. Each NUMBER represents an orthogroup. For orthogroups with duplications, DUP is the duplication count/id. If there are no duplications in the dataset being used, DUP will always be 1. If we are working with only a single species, then the gene names in a orthogroup are the same gene name followed by the cell cluster ID, e.g., {GeneX_cluster1, GeneX_cluster2, GeneX_cluster3}. Since scMTNI allows different gene sets in different cell clusters, we can set that gene to "None" for the cell clusters where it is absent. For example, if GeneX is absent in cluster 2, the aforementioned orthogroup will contain {GeneX_cluster1, None, GeneX_cluster3}.
s : A list of the cells present
Related Skills
node-connect
346.8kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
107.6kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
346.8kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
346.8kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
