ActivePathways
Integrative pathway enrichment analysis of multivariate omics data
Install / Use
/learn @reimandlab/ActivePathwaysREADME
ActivePathways - integrative pathway analysis of multi-omics data
July 11th 2025: ActivePathways version 2.0.6 is now available on CRAN and GitHub that provides functionality to compare directional and non-directional analyses. The major update 2.0 provides the directional p-value merging (DPM) method described in our recent publication.
ActivePathways is a tool for multivariate pathway enrichment analysis that identifies gene sets, such as pathways or Gene Ontology terms, that are over-represented in a list or matrix of genes. ActivePathways uses a data fusion method to combine multiple omics datasets, prioritises genes based on the significance and direction of signals from the omics datasets, and performs pathway enrichment analysis of these prioritised genes. We can find pathways and genes supported by single or multiple omics datasets, as well as additional genes and pathways that are only apparent through data integration and remain undetected in any single dataset alone.
The new version of ActivePathways is described in our recent publication.
Mykhaylo Slobodyanyuk^, Alexander T. Bahcheli^, Zoe P. Klein, Masroor Bayati, Lisa J. Strug, Jüri Reimand. Directional integration and pathway enrichment analysis for multi-omics data. Nature Communications 15, 5690 (2024). (^ - co-first authors) https://www.nature.com/articles/s41467-024-49986-4 https://pubmed.ncbi.nlm.nih.gov/38971800/
The first version of ActivePathways was published in Nature Communications with the PCAWG Pan-Cancer project.
Marta Paczkowska^, Jonathan Barenboim^, Nardnisa Sintupisut, Natalie S. Fox, Helen Zhu, Diala Abd-Rabbo, Miles W. Mee, Paul C. Boutros, PCAWG Drivers and Functional Interpretation Working Group, PCAWG Consortium, Juri Reimand. Integrative pathway enrichment analysis of multivariate omics data. Nature Communications 11, 735 (2020) (^ - co-first authors) https://www.nature.com/articles/s41467-019-13983-9 https://pubmed.ncbi.nlm.nih.gov/32024846/
The package version 2.0.3 used in the DPM preprint and manuscript is archived on Zenodo: https://zenodo.org/records/12118089.
Installation
Package tested with: MacOS 14, Windows 11, Ubuntu 20.04.
Software dependencies: data.table, ggplot2, testthat, knitr, rmarkdown, RColorBrewer.
Installation time: less than 2 minutes.
From CRAN: ActivePathways 2.0.5 is currently the most recent version
Open R and run install.packages('ActivePathways')
Using devtools on our GitHub repository
Using the R package devtools, run
devtools::install_github('https://github.com/reimandlab/ActivePathways', build_vignettes = TRUE)
From source on our GitHub repository
Clone the repository, for example using git clone https://github.com/reimandlab/ActivePathways.git.
Open R in the directory where you cloned the package and run install.packages("ActivePathways", repos = NULL, type = "source")
Using ActivePathways
See the vignette for more details. Run browseVignettes(package='ActivePathways') in R.
Examples
The simplest use of ActivePathways requires only a data table and a GMT file. The data table is a matrix of p-values of genes/transcripts/proteins as rows and omics datasets as columns. it also needs a list of gene sets in the form of a GMT (Gene Matrix Transposed) file.
-
The data table must be a numerical matrix. For a single gene list, a one-column matrix can be used. The matrix cannot contain any missing values, and one conservative option is to re-assign all missing values as 1s, indicating our confidence that the missing P-values are always insignificant. Alternatively, one may consider removing genes with NA values.
-
Gene sets in the form of a GMT file can be acquired from multiple sources such as Gene Ontology, Reactome and others. For better accuracy and statistical power these pathway databases should be combined. Acquiring an up-to-date GMT file is essential to avoid using unreliable outdated annotations (see this paper).
library(ActivePathways)
##
# Run an example using the data files included in the ActivePathways package.
# This basic example does not incorporate directionality.
##
fname_scores <- system.file("extdata", "Adenocarcinoma_scores_subset.tsv",
package = "ActivePathways")
fname_GMT <- system.file("extdata", "hsapiens_REAC_subset.gmt",
package = "ActivePathways")
##
# Numeric matrix of p-values is required as input.
# NA values are converted to P = 1.
##
scores <- read.table(fname_scores, header = TRUE, row.names = 'Gene')
scores <- as.matrix(scores)
scores[is.na(scores)] <- 1
##
# Main call of ActivePathways function:
##
enriched_pathways <- ActivePathways(scores, fname_GMT)
#35 terms were removed from gmt because they did not make the geneset_filter
#91 rows were removed from scores because they are not found in the background
##
# list a few first results of enriched pathways identified by ActivePathways
##
enriched_pathways[1:3,]
# term_id term_name adjusted_p_val term_size
#1: REAC:2424491 DAP12 signaling 4.491268e-05 358
#2: REAC:422475 Axon guidance 2.028966e-02 555
#3: REAC:177929 Signaling by EGFR 6.245734e-04 366
# overlap evidence
#1: TP53,PIK3CA,KRAS,PTEN,BRAF,NRAS,... CDS
#2: PIK3CA,KRAS,BRAF,NRAS,CALM2,RPS6KA3,... X3UTR,promCore
#3: TP53,PIK3CA,KRAS,PTEN,BRAF,NRAS,... CDS
# Genes_X3UTR Genes_X5UTR
#1: NA NA
#2: CALM2,ARPC2,RHOA,NUMB,CALM1,ACTB,... NA
#3: NA NA
# Genes_CDS
#1: TP53,PTEN,KRAS,PIK3CA,BRAF,NRAS,...
#2: NA
#3: TP53,PTEN,KRAS,PIK3CA,BRAF,NRAS,...
# Genes_promCore
#1: NA
#2: EFNA1,IQGAP1,COL4A1,SCN2B,RPS6KA3,CALM2,...
#3: NA
##
# Show enriched genes of the first pathway 'DAP12 signalling'
# the column `overlap` displays genes of the integrated dataset (from
# data fusion, i.e., p-value merging) that occur in the given pathway.
# Genes are ranked by joint significance across input omics datasets.
##
enriched_pathways[["overlap"]][[1]]
# [1] "TP53" "PIK3CA" "KRAS" "PTEN" "BRAF" "NRAS" "B2M" "CALM2"
# [9] "CDKN1A" "CDKN1B"
##
# Save the resulting pathways as a Comma-Separated Values (CSV) file
# for spreadsheets and computational pipelines.
# the data.table object cannot be saved directly as text.
##
export_as_CSV(enriched_pathways, "enriched_pathways.csv")
##
# Examine a few lines of the two major types of input
##
##
# The scores matrix includes p-values for genes (rows)
# and evidence of different omics datasets (columns).
# This dataset includes predicted cancer driver mutations
# in gene coding/CDS, 5'UTR, 3'UTR, and core promoter sequences
##
head(scores, n = 3)
# X3UTR X5UTR CDS promCore
#A2M 1.0000000 0.33396764 0.9051708 0.4499201
#AAAS 1.0000000 0.42506012 0.7047723 0.7257641
#ABAT 0.9664126 0.04202735 0.7600985 0.1903789
##
# GMT files include functional gene sets (pathways, processes).
# Each tab-separated line represents a gene set:
# gene set ID, description followed by gene symbols.
# Gene symbols in the scores table and the GMT file need to match.
# NB: this GMT file is a small subset of the real GMT file built for testing.
# It should not be used for real analyses.
##
readLines(fname_GMT)[11:13]
#[1] "REAC:3656535\tTGFBR1 LBD Mutants in Cancer\tTGFB1\tFKBP1A\tTGFBR2\tTGFBR1\t"
#[2] "REAC:73927\tDepurination\tOGG1\tMPG\tMUTYH\t"
#[3] "REAC:5602410\tTLR3 deficiency - HSE\tTLR3\t"
Examples - Directional integration of multi-omics data
ActivePathways 2.0 extends our integrative pathway analysis framework significantly. Users can now provide directional assumptions of input omics datasets for more accurate analyses. This allows us to prioritise genes and pathways where certain directional assumptions are met, and penalise those where the assumptions are violated.
For example, fold-change in protein expression would be expected to associate positively with mRNA fold-change of the corresponding gene, while negative associations would be unexpected and indicate more-complex situations or potential false positives. We can instruct the pathway analysis to prioritise positively-associated protein/mRNA pairs and penalise negative associations (or vice versa).
Two additional inputs are included in ActivePathways that allow diverse multi-omics analyses. These inputs are optional.
The scores_direction and constraints_vector parameters are provided in the merge_p_values() and ActivePathways() functions to incorporate this directional penalty into the data fusion and pathway enrichment analyses.
The parameter constraints_vector is a vector that allows the user to represent the expected relationship between the input omics datasets. The vector size is n_datasets. Values include +1, -1, and 0. The constraints_vector should reflect the expected relative directional relationship between datasets. For example, the constraints_vector values c(-1,1) and c(1,-1) are functionally identical. When combining datasets that contain both directional datatypes (eg gene or protein expression, gene promoter methylation) and non-directional datatypes (eg gene mutational burden, ChIP-seq), we can define the relative relationship between directional datatypes with the values 1 and -1 while setting the value of non-directional datatypes to 0.
The parameter sco
