scdrake

License

{scdrake} is a scalable and reproducible pipeline for secondary analysis of droplet-based single-cell RNA-seq data (scRNA-seq) and spot-based spatial transcriptomics data (SRT). {scdrake} is an R package built on top of the {drake} package, a Make-like pipeline toolkit for R language.

The main features of the {scdrake} pipeline are:

Import of scRNA-seq data: 10x Genomics Cell Ranger output, delimited table, or SingleCellExperiment object.
Import of SRT data: 10x Genomics Space Ranger output, delimited table, or SingleCellExperiment object.
Quality control and filtering of cells/spots and genes, removal of empty droplets.
Spatial artifact detection for spot-based data.
Higly variable genes detection, cell cycle scoring, normalization, clustering, and dimensionality reduction.
Spatially variable genes detection (for SRT data)
Cell type annotation using reference sets, cell type annotation using user-provided marker genes.
Spot deconvolution using reference single-cell experiment.
Integration of multiple datasets.
Computation of cluster markers and differentially expressed genes between clusters (denoted as “contrasts”).
Rich graphical and HTML outputs based on customizable RMarkdown documents.
- You can find links to example outputs here.
Thanks to {drake}, the pipeline is highly efficient, scalable and reproducible, and also extendable.
- Want to change some parameter? No problem! Only parts of the pipeline which changed will rerun, while up-to-date ones will be skipped.
- Want to reuse the intermediate results for your own analyses? No problem! The pipeline has smartly defined checkpoints which can be loaded from a {drake} cache.
- Want to extend the pipeline? No problem! The pipeline definition is just an R object which can be arbitrarily extended.

For whom is {scdrake} purposed? It is primarily intended for tech-savvy users (bioinformaticians), who pass on the results (reports, images) to non-technical persons (biologists). At the same time, bioinformaticians can quickly react to biologists’ needs by changing the parameters of the pipeline, which then efficiently skips already finished parts. This dialogue between the biologist and the bioinformatician is indispensable during scRNA-seq data analysis. {scdrake} ensures that this communication is performed in an effective and reproducible manner.

The pipeline structure along with diagrams and links to outputs is described in vignette("pipeline_overview") (link).

If you use {scdrake} in your research, please, consider citing

Kubovciak J, Kolar M, Novotny J (2023). “Scdrake: a reproducible and scalable pipeline for scRNA-seq data analysis.” Bioinformatics Advances, 3(1). doi:10.1093/bioadv/vbad089.

Huge thanks go to the authors of the Orchestrating Single-Cell Analysis with Bioconductor and Orchestrating Spatial Transcriptomics Analysis with Bioconductor book on whose methods and recommendations is {scdrake} largely based.

If you wish to contribute, please refer to and .github/CONTRIBUTING.md manual and vignette("scdrake_development").

Installation instructions

Using a Docker image (recommended)

A Docker image based on the official Bioconductor image (version 3.21) is available. This is the most handy and reproducible way how to use {scdrake} as all the dependencies are already installed and their versions are fixed. In addition, the parent Bioconductor image comes bundled with RStudio Server.

The complete guide to the usage of {scdrake}’s Docker image can be found in the Docker vignette. We strongly recommend to go through even if you are an experienced Docker user. Below you can find just the basic command to download the image and to run a detached container with RStudio in Docker or to run {scdrake} in Singularity.

You can also run the image in SingularityCE (without RStudio) - see the Singularity section in the Docker vignette above. If the image is already downloaded in the local Docker storage, you can use singularity pull docker-daemon:<image>

You can pull the Docker image with the latest stable {scdrake} version using

docker pull jirinovo/scdrake:1.7.1
singularity pull docker:jirinovo/scdrake:1.7.1

or list available versions in our Docker Hub repository.

For the latest development version use

docker pull jirinovo/scdrake:latest
singularity pull docker:jirinovo/scdrake:latest

Note for Mac users with M1/M2 chipsets: until version 1.5.0 (inclusive), arm64 images are available. For spatial extention, Docker can be found at:

docker pull pfeiferl/scdrake:1.7.1-bioc3.21-arm64

Running the container

For the most common cases of host machines: Linux running Docker Engine, and Windows or MacOS running Docker Desktop.

First make a shared directory that will be mounted to the container:

mkdir ~/scdrake_projects
cd ~/scdrake_projects

And run the image that will expose RStudio Server on port 8787 on your host:

docker run -d \
  -v $(pwd):/home/rstudio/scdrake_projects \
  -p 8787:8787 \
  -e USERID=$(id -u) \
  -e GROUPID=$(id -g) \
  -e PASSWORD=1234 \
  jirinovo/scdrake:1.7.1

For Singularity, also make shared directories and execute the container (“run and forget”):

mkdir -p ~/scdrake_singularity
cd ~/scdrake_singularity
mkdir -p home/${USER} scdrake_projects
singularity exec \
    -e \
    --no-home \
    --bind "home/${USER}/:/home/${USER},scdrake_projects/:/home/${USER}/scdrake_projects" \
    --pwd "/home/${USER}/scdrake_projects" \
    path/to/scdrake_image.sif \
    scdrake <args> <command>

Installing `{scdrake}` manually (not recommended)

Click for details

</summary>

Install the required system packages

For Linux, follow the commands for your distribution here.
For MacOS: $ brew install libxml2 imagemagick@6 harfbuzz fribidi libgit2 geos pandoc

Install R >= 4.2

See https://cloud.r-project.org/

From now on, all commands are for R.

Install `{renv}`

{renv} is an R package for management of local R libraries. It is intended to be used on a per-project basis, i.e. each project should use its own library of R packages.

install.packages("renv")

Initialize a new `{renv}` library

Switch to directory where you will analyze data and initialize a new {renv} library:

renv::consent(TRUE)
renv::init()

Now exit and run again R. You should see a message that renv library has been activated.

Install BiocManager

renv::install("BiocManager")

Install Bioconductor 3.21

BiocManager::install(version = "3.21")

Restore `{scdrake}` dependencies from lockfile

{renv} also allows to export the current installed versions of R packages (and other things) into a lockfile. Such lockfile is available for {scdrake} and you can use it to install all dependencies by

## -- This is a lockfile for the latest stable version of scdrake.
download.file("https://raw.githubusercontent.com/bioinfocz/scdrake/1.7.1/renv.lock")
## -- You can increase the number of CPU cores to speed up the installation.
options(Ncpus = 2)
renv::restore(lockfile = "renv.lock", repos = BiocManager::repositories())

For the lockfile for the latest development version use

download.file("https://raw.githubusercontent.com/bioinfocz/scdrake/main/renv.lock")

Install the `{scdrake}` package

Now we can finally install the {scdrake} package, but using a non-standard approach - without its dependencies (which are already installed from the lockfile).

remotes::install_github(
  "bioinfocz/scdrake@1.7.1",
  dependencies = FALSE, upgrade = FALSE,
  keep_source = TRUE

Scdrake

Install / Use

README

scdrake

Installation instructions

Using a Docker image (recommended)

Running the container

Installing `{scdrake}` manually (not recommended)

Install the required system packages

Install R >= 4.2

Install `{renv}`

Initialize a new `{renv}` library

Install BiocManager

Install Bioconductor 3.21

Restore `{scdrake}` dependencies from lockfile

Install the `{scdrake}` package

Scdrake

Install / Use

README

scdrake

Installation instructions

Using a Docker image (recommended)

Running the container

Installing {scdrake} manually (not recommended)

Install the required system packages

Install R >= 4.2

Install {renv}

Initialize a new {renv} library

Install BiocManager

Install Bioconductor 3.21

Restore {scdrake} dependencies from lockfile

Install the {scdrake} package

Installing `{scdrake}` manually (not recommended)

Install `{renv}`

Initialize a new `{renv}` library

Restore `{scdrake}` dependencies from lockfile

Install the `{scdrake}` package