scDist: Robust identification of perturbed cell types in single-cell RNA-seq data

R package version 1.1.5

Overview

scDist is an R package that estimates the distance between cell populations in high-dimensional gene-expression space. It can be used to measure which cell types change the most between two experimental conditions (e.g., treated vs control). When there are multiple replicates (i.e., patients), scDist uses linear-mixed effects models to correct for sample-to-sample variability.

To run scDist you will need a normalized expression matrix and metadata assigning each cell to a condition and cluster. A simulated demo is provided below.

If you use scDist in your work, please cite:

Nicol, P.B., Paulson, D., Qian, G., Liu, X.S., Irizarry, R.A., and Sahu, A.D. (2024). Robust identification of perturbed cell types in single-cell RNA-seq data. Nature Communications. Vol 15(7610). https://doi.org/10.1038/s41467-024-51649-3.

System Requirements

R is required to use scDist. In development, R version 4.0.0 and greater were used, but there may be compatibility with previous versions.

Installation

From the R console, devtools::install_github("phillipnicol/scDist"). Installation should take less than a minute on a standard machine.

Demo

The input to scDist is a normalized count matrix and correpsonding metadata that describes what condition and patient each cell belongs to. In this demo, we create a simulated dataset with 10 cell types. The demo should take less than a minute to run on a standard machine. The code is also modifiable to see how scDist performs for different parameter values.

library(scDist)
set.seed(1126490984)

Generate simulated data with 10 cell types and 5 patients in each group:

sim <- simData(nct=10,N1=5,N2=5)

dim(sim$Y) #Normalized counts

## [1] 1000 5100

rownames(sim$Y) <- 1:1000
head(sim$meta.data)

##   response patient clusters
## 1        1       1        a
## 2        1       1        a
## 3        1       1        a
## 4        1       1        a
## 5        1       1        a
## 6        1       1        a

Now we apply scDist:

out <- scDist(sim$Y,sim$meta.data,fixed.effects = "response",
              random.effects="patient",
              clusters="clusters")

## ================================================================================

The results data frame gives a summary of the estimated distance and uncertainty for each cell type

out$results

##       Dist. 95% CI (low) 95% CI (upper)        p.val
## a 33.044137    32.281250       33.82516 0.0000099999
## b 10.870614     8.007240       13.73925 0.0048399516
## c  9.885715     5.217373       13.39083 0.4479655203
## d 15.240428    13.991557       16.58055 0.0000099999
## e 18.897089    18.201092       19.64802 0.0000099999
## f  0.000000     0.000000        0.00000 0.9478205218
## g 33.022750    32.519064       33.55178 0.0000099999
## h 22.591648    21.749534       23.49203 0.0000099999
## i 16.613041    15.731826       17.54432 0.0000099999
## j 16.371930    15.131200       17.60823 0.0000099999

The true distances are

names(sim$D.true) <- letters[1:length(sim$D.true)]
sim$D.true

##         a         b         c         d         e         f         g         h 
## 31.497609  4.408609  5.153658 10.700209 15.551271  2.764961 31.310492 19.893417 
##         i         j 
## 13.349565 13.045934

We can also plot the results

DistPlot(out)

To get a plot of genes that are associated with the perturbation use distGenes:

distGenes(out, cluster = "a")

ScDist

Install / Use

README

scDist: Robust identification of perturbed cell types in single-cell RNA-seq data

Overview

System Requirements

Installation

Demo