Cytotrace2
CytoTRACE 2 is an interpretable AI method for predicting cellular potency and absolute developmental potential from scRNA-seq data.
Install / Use
/learn @digitalcytometry/Cytotrace2README
We are thrilled to introduce CytoTRACE 2 version 1.1.0, packed with significant performance enhancements to elevate your single-cell transcriptomic analyses. Here's what's new in this release:
🔍 Major Updates and Enhancements
-
Retrained CytoTRACE 2 Framework
The CytoTRACE 2 model has been retrained, yielding additional performance gains in granular potency prediction and enhancing cross-platform robustness. -
Expanded Ensemble Model
The ensemble now comprises 19 models instead of 17, improving the predictive power and stability of the framework. -
Background Expression Matrix
Introduced a background expression matrix generated during training for improved regularization. -
Enhanced Data Representations
Added Log2-adjusted representation of the input expression data to be used for prediction on top of ranked expression profiles, to capture detailed transcriptomic signals. This changes the requirement for the input expression data to contain only raw or CPM/TPM normalized counts. -
Adaptive Nearest Neighbor Smoothing
Modified the KNN smoothing step to employ an adaptive nearest neighbor smoothing strategy.
💻 Codebase and Distribution Updates
-
Codebase Updates
- Updated both R and Python package codebases to reflect all the above changes.
- Optimized for time and memory efficiency, ensuring faster computations and scalability.
-
Enhanced Python Package Distribution
The Python version of CytoTRACE 2 is now available on PyPI, making installation easier for Python users.
📚 Documentation and Guides
- Updated Vignettes to align with the new model features and usage instructions.
- Refreshed README with new information, detailed explanations, and FAQ items tailored to the new framework.
We deeply appreciate the contributions from our community that made this release possible. Thank you for your continued support! 🙏
<h2> <p align="center"> Prediction of absolute developmental potential <br> using single-cell expression data </p> </h2>CytoTRACE 2 is a computational method for predicting cellular potency categories and absolute developmental potential from single-cell RNA-sequencing data.
Potency categories in the context of CytoTRACE 2 classify cells based on their developmental potential, ranging from totipotent and pluripotent cells with broad differentiation potential to lineage-restricted oligopotent, multipotent and unipotent cells capable of producing varying numbers of downstream cell types, and finally, differentiated cells, ranging from mature to terminally differentiated phenotypes.
The predicted potency scores additionally provide a continuous measure of developmental potential, ranging from 0 (differentiated) to 1 (totipotent).
Underlying this method is a novel, interpretable deep learning framework trained and validated across 34 human and mouse scRNA-seq datasets encompassing 24 tissue types, collectively spanning the developmental spectrum.
This framework learns multivariate gene expression programs for each potency category and calibrates outputs across the full range of cellular ontogeny, facilitating direct cross-dataset comparison of developmental potential in an absolute space.
<p align="center"> <img width="900" src="images/schematic.png"> </p>This documentation page details the R package for applying CytoTRACE 2. <strong> For the python package, see <a href="/cytotrace2_python" target="_blank">CytoTRACE 2 Python</a>.</strong>
Installation
We recommend installing the CytoTRACE 2 package using the devtools package from the R console. If you do not have devtools installed, you can install it by running install.packages("devtools") in the R console.
devtools::install_github("digitalcytometry/cytotrace2", subdir = "cytotrace2_r") #installing
library(CytoTRACE2) #loading
See alternative installation and package management methods (including an easy-to-use conda environment that precisely solves all dependencies) in the Advanced options section below.
The installation of the CytoTRACE 2 package itself typically takes about one minute on a standard computer. Optional installation of the provided conda environment generally takes 5-10 minutes but can vary substantially, sometimes requiring up to an hour depending on system and conda version.
NOTE: We recommend using Seurat v4 or later for full compatibility with CytoTRACE 2 package. If you don't have Seurat installed, you can install it by running install.packages("Seurat") in the R console prior to installing CytoTRACE 2 or use the provided conda environment.
The following list includes the versions of packages used during the development of CytoTRACE 2. While CytoTRACE 2 is compatible with various (older and newer) versions of these packages, it's important to acknowledge that specific combinations of dependency versions can lead to conflicts. The only such conflict known at this time happens when using Seurat v4 in conjunction with Matrix v1.6. This issue can be resolved by either upgrading Seurat or downgrading Matrix.
R (4.2.3)
data.table (1.14.8)
doParallel (1.0.17)
dplyr (1.1.3)
ggplot2 (3.4.4)
HiClimR (2.2.1)
magrittr (2.0.3)
Matrix (1.5-4.1)
parallel (4.2.3)
plyr (1.8.9)
RANN (2.6.1)
Rfast (2.0.8)
RSpectra (0.16.1)
Seurat (4.3.0.1)
SeuratObject (4.1.3)
stringr (1.5.1)
</details>
Running CytoTRACE 2
Running CytoTRACE 2 is easy and straightforward. After loading the library, simply execute the cytotrace2() function, with one required input, expression data, to obtain potency score and potency category predictions. Subsequently, running plotData will generate informative visualizations based on the predicted values, and external annotations, if available. Below, find two vignettes showcasing the application on a mouse dataset and a human dataset.
To illustrate use of CytoTRACE 2 with a mouse dataset, we will use the dataset Pancreas_10x_downsampled.rds, originally from Bastidas-Ponce et al., 2019, filtered to cells with known ground truth developmental potential and downsampled, available to download here, containing 2 objects:
- expression_data: gene expression matrix for a scRNA-seq (10x Chromium) dataset encompassing 2850 cells from murine pancreatic epithelium
- annotation: phenotype annotations for the scRNA-seq dataset above.
After downloading the .rds file, we apply CytoTRACE 2 to this dataset as follows:
# load the CytoTRACE 2 package
library(CytoTRACE2)
# download the .rds file (this will download the file to your working directory)
download.file("https://drive.google.com/uc?export=download&id=1TYdQsMoDIJjoeuiTD5EO_kZgNJUyfRY2", "Pancreas_10x_downsampled.rds")
# load rds
data <- readRDS("Pancreas_10x_downsampled.rds")
# extract expression data
expression_data <- data$expression_data
# running CytoTRACE 2 main function - cytotrace2 - with default parameters
cytotrace2_result <- cytotrace2(expression_data)
# extract annotation data
annotation <- data$annotation
# generate prediction and phenotype association plots with plotData function
plots <- plotData(cytotrace2_result = cytotrace2_result,
annotation = annotation,
expression_data = expression_data
)
Expected prediction output, dataframe cytotrace2_result looks as shown below (can be downloaded from here):
This dataset contains cells from 4 different embryonic stages of a murine pancreas, and has the following cell types present:
- Multipotent pancreatic progenitors
- Endocrine progenitors and precursors
- Immature endocrine cells
- Alpha, Beta, Delta, and Epsilon cells
Each of these cell types is at a different stage of development, with progenitors and precursors having varying potential to differentiate into other cell types, and mature cells having no potential for further development. We use CytoTRACE 2 to predict the absolute developmental potential of each cell, which we term as "potency score", as a continuous value ranging from 0 (differentiated) to 1 (stem cells capable of generating an entire multicellular organism). The discrete potency categories that the potency scores cover are Differentiated, Unipotent, Oligopotent, Multipotent, Pluripotent, and Totipotent.
In this case, we would expect to see:
- close to 0 potency scores alpha, beta, delta, and epsilon cells as those are known to be differentiated,
- scores in the higher mid-range for multipotent pancreatic progenitors as those are known to be multipotent,
- for endocrine progenitors, precursors and immature cells, the ground truth is not unique, but is in the range for unipotent category. So we would expect to see scores in the lower range for these cells, closer to differentiated.
Visualizing the results we can directly compare the predicted potency scores with the known developmental stage of the cells, seeing how the predictions meticulously align with the known biology. Take a
