NetREm
Network Regression Embeddings reveal cell-type Transcription Factor coordination for target gene (TG) regulation
Install / Use
/learn @SaniyaKhullar/NetREmQuality Score
Category
Development & EngineeringSupported Platforms
Tags
README
NetREm
Network regression embeddings reveal cell-type transcription factor coordination for gene regulation
<!-- ##### GRegNet Gene Regular(ized/atory) Network -->By: Saniya Khullar, Xiang Huang, Raghu Ramesh, John Svaren, Daifeng Wang
Daifeng Wang Lab <br>
Summary
NetREm is a software package that utilizes network-constrained regularization for biological applications and other network-based learning tasks. In biology, traditional regression methods can struggle with correlated predictors, particularly transcription factors (TFs) that regulate target genes (TGs) in gene regulatory networks (GRNs). NetREm incorporates information from prior biological networks to improve predictions and identify complex relationships among predictors (e.g. TF-TF coordination: direct/indirect interactions among TFs). This approach can highlight important nodes and edges in the network, reveal novel regularized embeddings for genes, provide insights into underlying biological processes, identify subnetworks of predictors that group together to influence the response variable, and improve model accuracy and biological/clinical significance of the models. NetREm can incorporate multiple types of network data, including Protein-Protein Interaction (PPI) networks, gene co-expression networks, and metabolic networks. In summary, network-constrained regularization may bolster the construction of more accurate and interpretable models that incorporate prior knowledge of the network structure among predictors.
<!-- GRegNet is a software package that utilizes network-constrained regularization for biological applications and other network-based learning tasks. In biology, traditional regression methods can struggle with correlated predictors, particularly transcription factors (TFs) that regulate target genes in gene regulatory networks (GRNs). GRegNet incorporates information from prior biological networks to improve predictions and identify complex relationships among predictors. This approach can highlight important nodes and edges in the network, provide insights into underlying biological processes, and improve model accuracy and biological/clinical significance of the models. GRegNet can incorporate multiple types of network data, including PPI networks, gene co-expression networks, and metabolic networks. --> <!-- s. -->Pipeline
Pipeline image of NetREm

Hardware Requirements
The minimum requirement is a computer with 8 GB of RAM and 32 GB of storage. For large prior graph networks, 32 GB of RAM is recommended.
Software Requirements and Installation Guide
The software uses Python 3.10. After downloading the NetREm Github code, conda/Anaconda users can use the following steps to install:
- In the Anaconda navigator prompt, create a virtual environment of Python 3.10 by running:<br>
conda create -n NetREm python=3.10 - Activate the environment:<br>
conda activate NetREm - Make sure to change the current directory to the NetREm folder.<!-- Saniya: This is operating system dependent, you may add command to change directory for Windows/Madc/Linux, cd for windows/Linux, what Shanw used for Windows is cd C:\C:\Users\Shawn\Code\NetREm -->
- Install the packages and dependencies (math, matplotlib, networkx, numpy, typing, os, pandas, plotly.express, random, scipy, scikit-optimize, scikit-learn, sys, tqdm, warnings):<br>
pip install -r requirements.txt
Please note that if you encounter import errors from files or functions in the code folder (such as Netrem_model_builder.py), add an empty file named init.py to the code folder, and add the "code." prefix to all imports from the "code" folder. For example, import Netrem_model_builder as nm :arrow_right: import code.Netrem_model_builder as nm.
Usage of the NetREm main function netrem()
NetREm fits a Network-constrained Lasso regression machine learning model with user-provided weights for the prior network. Here, netrem is the main function with the following usage:
<!-- For biological applications, it is recommended that the user ensure network names map to gene expression names --> <!-- SHould we have 2 arguments? 1 for default_edge_weight for nodes found in network. default_weight_prior_edges: for any edge in the edge_list that has an unknown weight, we provide this edge_weight. Thus, we are prioritizing edges provided in the edgelist over those not found in the edge_list originally. Then we can show that since we skipped out on sharing an edge, the code automatically added in an edge with lower edge weight. default_weight_new_edges. -->netrem(<br> edge_list, <br> beta_net = 1, <br> alpha_lasso = 0.01, <br> default_edge_weight = 0.01,<br> edge_vals_for_d = True,<br> w_transform_for_d = "none",<br> degree_threshold = 0.5,<br> gene_expression_nodes = [],<br> overlapped_nodes_only = False,<br> y_intercept = False, <br> view_network = False, <br> model_type = "Lasso",<br> ...<br> )
<!-- degree_pseudocount = 1e-3,<br> --> <!-- has 2 options with respect to the alpha_lasso_val ($\alpha_{lasso} \geq 0$) for the lasso regularization on the overall model: * default: the user may specify $\alpha_{lasso}$ manually (if *cv_for_alpha_lasso_model_bool = False*). If no alpha_lasso_val is specified, 0.1 will be used. * alternative: the user may opt for GRegulNet to select $\alpha_{lasso}$ based on cross-validation (CV) on training data (if *cv_for_alpha_lasso_model_bool = True*) --> <!-- Ultimately, this function uses a prior network edge list and $\beta_{network}$ to build an estimator object from the class GRegulNet. This estimator can then take in input $X$ and $y$ data: transforms them to $\tilde{X}$ and $\tilde{y}$, respectively, and use them to fit a Lasso regression model with a regularization value of $\alpha_{lasso}$. Overall, the trained model is more reflective of an underlying network structure among predictors and may be more biologically meaningful and interpretable. --> <!-- $$ \begin{cases} \text{geneRegulatNet(edge_list, } \beta_{network}, \text{cv_for_alpha_lasso_model_bool = } False, \alpha_{lasso}\text{)} & \text{if cv_for_alpha_lasso_model_bool = } False \\ \text{geneRegulatNet(edge_list, } \beta_{network}, \text{cv_for_alpha_lasso_model_bool = } True) & \text{if cv_for_alpha_lasso_model_bool = } True \\ \end{cases} $$ There are several additional parameters that can be adjusted in the geneRegulatNet function, which will be explained later in the *Default Parameters* section. --> <!-- ### Main Input: --> <!-- * *edge_list*: A list of lists corresponding to a prior network involving the predictors (as nodes) and relationships among them as edges. We will utilize this prior network to constrain our machine learning model. For instance, this could be a Protein-Protein Interaction (PPI) network of interactions among the predictors. If weights are missing for any edge, then the default_weight will be used for that respective edge. We assume that this network is undirected and thereby symmetric, so the user only needs to specify edges in 1 direction (and the other direction will be assumed automatically). For instance: [[source<sub>1</sub>, target<sub>1</sub>, weight<sub>1</sub>], [source<sub>2</sub>, target<sub>2</sub>, weight<sub>2</sub>], ..., [source<sub>Z</sub>, target<sub>Z</sub>, weight<sub>Z</sub>]]. Where weight<sub>1</sub>, weight<sub>2</sub>, ..., weight<sub>Z</sub> are optional. If an edge is missing its respective edge weight, then the default edge weights will be utilized. The edge_list will be represented by: | Source | Target | Weight | | --------- | ---------- | ---------- | |source<sub>1</sub> | target<sub>1</sub> | weight<sub>1</sub>| |source<sub>2</sub> | target<sub>2</sub> | weight<sub>2</sub> | |... | ... | ... | |source<sub>Z</sub> | target<sub>Z</sub> | weight<sub>Z</sub>| |target<sub>1</sub> | source<sub>1</sub> | weight<sub>1</sub> | |target<sub>2</sub> | source<sub>2</sub> | weight<sub>2</sub> | |... | ... | ... | |target<sub>Z</sub> | source<sub>Z</sub> | weight<sub>Z</sub> | --> <!-- * *beta_network_val*: A numerical value for $\beta_{network} \geq 0$. --> <!-- * *cv_for_alpha_lasso_model_bool*: - False (default): user wants to specify the value of $\alpha_{lasso}$ - True: GRegulNet will perform cross-validation (CV) on training data to determine optimal $\alpha_{lasso}$ --> <!-- $$ = \begin{cases} \text{if cv_for_alpha_lasso_model_bool = } False & \text{default: user wants to specify the value of } \alpha_{lasso} \\ \text{if cv_for_alpha_lasso_model_bool = } True & \text{GRegulNet will perform cross-validation (CV) on training data to determine optimal } \alpha_{lasso} \\ \end{cases} $$ --> <!-- $$ = \begin{cases} \text{if cv_for_alpha_lasso_model_bool = } False & \text{default: user wants to specify the value of } \alpha_{lasso} \\ \text{if cv_for_alpha_lasso_model_bool = } True & \text{GRegulNet will perfo