Scdrake
A pipeline for droplet-based single-cell RNA-seq and spot-based spatial transcriptomics data secondary analysis implemented in the drake Make-like pipeline toolkit for the R language.
Install / Use
/learn @bioinfocz/ScdrakeREADME
scdrake
{scdrake} is a scalable and reproducible pipeline for secondary
analysis of droplet-based single-cell RNA-seq data (scRNA-seq) and
spot-based spatial transcriptomics data (SRT). {scdrake} is an R
package built on top of the {drake} package, a
Make-like pipeline toolkit for R
language.
The main features of the {scdrake} pipeline are:
- Import of scRNA-seq data: 10x Genomics Cell
Ranger
output, delimited table, or
SingleCellExperimentobject. - Import of SRT data: 10x Genomics Space
Ranger
output, delimited table, or
SingleCellExperimentobject. - Quality control and filtering of cells/spots and genes, removal of empty droplets.
- Spatial artifact detection for spot-based data.
- Higly variable genes detection, cell cycle scoring, normalization, clustering, and dimensionality reduction.
- Spatially variable genes detection (for SRT data)
- Cell type annotation using reference sets, cell type annotation using user-provided marker genes.
- Spot deconvolution using reference single-cell experiment.
- Integration of multiple datasets.
- Computation of cluster markers and differentially expressed genes between clusters (denoted as “contrasts”).
- Rich graphical and HTML outputs based on customizable RMarkdown
documents.
- You can find links to example outputs here.
- Thanks to
{drake}, the pipeline is highly efficient, scalable and reproducible, and also extendable.- Want to change some parameter? No problem! Only parts of the pipeline which changed will rerun, while up-to-date ones will be skipped.
- Want to reuse the intermediate results for your own analyses? No
problem! The pipeline has smartly defined checkpoints which can be
loaded from a
{drake}cache. - Want to extend the pipeline? No problem! The pipeline definition is just an R object which can be arbitrarily extended.
For whom is {scdrake} purposed? It is primarily intended for
tech-savvy users (bioinformaticians), who pass on the results (reports,
images) to non-technical persons (biologists). At the same time,
bioinformaticians can quickly react to biologists’ needs by changing the
parameters of the pipeline, which then efficiently skips already
finished parts. This dialogue between the biologist and the
bioinformatician is indispensable during scRNA-seq data analysis.
{scdrake} ensures that this communication is performed in an effective
and reproducible manner.
The pipeline structure along with
diagrams
and links to outputs is described in vignette("pipeline_overview")
(link).
If you use {scdrake} in your research, please, consider citing
Kubovciak J, Kolar M, Novotny J (2023). “Scdrake: a reproducible and scalable pipeline for scRNA-seq data analysis.” Bioinformatics Advances, 3(1). doi:10.1093/bioadv/vbad089.
Huge thanks go to the authors of the Orchestrating Single-Cell Analysis
with Bioconductor and
Orchestrating Spatial Transcriptomics Analysis with
Bioconductor book on whose methods and
recommendations is {scdrake} largely based.
If you wish to contribute, please refer to and .github/CONTRIBUTING.md
manual and vignette("scdrake_development").
Installation instructions
Using a Docker image (recommended)
A Docker image based on the official Bioconductor
image (version 3.21) is
available. This is the most handy and reproducible way how to use
{scdrake} as all the dependencies are already installed and their
versions are fixed. In addition, the parent Bioconductor image comes
bundled with RStudio Server.
The complete guide to the usage of {scdrake}’s Docker image can be
found in the Docker
vignette.
We strongly recommend to go through even if you are an experienced
Docker user. Below you can find just the basic command to download the
image and to run a detached container with RStudio in Docker or to run
{scdrake} in Singularity.
You can also run the image in
SingularityCE
(without RStudio) - see the Singularity section in the Docker vignette
above. If the image is already downloaded in the local Docker storage,
you can use singularity pull docker-daemon:<image>
You can pull the Docker image with the latest stable {scdrake} version
using
docker pull jirinovo/scdrake:1.7.1
singularity pull docker:jirinovo/scdrake:1.7.1
or list available versions in our Docker Hub repository.
For the latest development version use
docker pull jirinovo/scdrake:latest
singularity pull docker:jirinovo/scdrake:latest
Note for Mac users with M1/M2 chipsets: until version 1.5.0
(inclusive), arm64 images are available. For spatial extention, Docker
can be found at:
docker pull pfeiferl/scdrake:1.7.1-bioc3.21-arm64
Running the container
For the most common cases of host machines: Linux running Docker Engine, and Windows or MacOS running Docker Desktop.
First make a shared directory that will be mounted to the container:
mkdir ~/scdrake_projects
cd ~/scdrake_projects
And run the image that will expose RStudio Server on port 8787 on your host:
docker run -d \
-v $(pwd):/home/rstudio/scdrake_projects \
-p 8787:8787 \
-e USERID=$(id -u) \
-e GROUPID=$(id -g) \
-e PASSWORD=1234 \
jirinovo/scdrake:1.7.1
For Singularity, also make shared directories and execute the container (“run and forget”):
mkdir -p ~/scdrake_singularity
cd ~/scdrake_singularity
mkdir -p home/${USER} scdrake_projects
singularity exec \
-e \
--no-home \
--bind "home/${USER}/:/home/${USER},scdrake_projects/:/home/${USER}/scdrake_projects" \
--pwd "/home/${USER}/scdrake_projects" \
path/to/scdrake_image.sif \
scdrake <args> <command>
Installing {scdrake} manually (not recommended)
<details>
<summary>
Click for details
</summary>Install the required system packages
- For Linux, follow the commands for your distribution here.
- For MacOS:
$ brew install libxml2 imagemagick@6 harfbuzz fribidi libgit2 geos pandoc
Install R >= 4.2
See https://cloud.r-project.org/
From now on, all commands are for R.
Install {renv}
{renv} is an R package for
management of local R libraries. It is intended to be used on a
per-project basis, i.e. each project should use its own library of R
packages.
install.packages("renv")
Initialize a new {renv} library
Switch to directory where you will analyze data and initialize a new
{renv} library:
renv::consent(TRUE)
renv::init()
Now exit and run again R. You should see a message that renv library has been activated.
Install BiocManager
renv::install("BiocManager")
Install Bioconductor 3.21
BiocManager::install(version = "3.21")
Restore {scdrake} dependencies from lockfile
{renv} also allows to export the current installed versions of R
packages (and other things) into a lockfile. Such lockfile is available
for {scdrake} and you can use it to install all dependencies by
## -- This is a lockfile for the latest stable version of scdrake.
download.file("https://raw.githubusercontent.com/bioinfocz/scdrake/1.7.1/renv.lock")
## -- You can increase the number of CPU cores to speed up the installation.
options(Ncpus = 2)
renv::restore(lockfile = "renv.lock", repos = BiocManager::repositories())
For the lockfile for the latest development version use
download.file("https://raw.githubusercontent.com/bioinfocz/scdrake/main/renv.lock")
Install the {scdrake} package
Now we can finally install the {scdrake} package, but using a
non-standard approach - without its dependencies (which are already
installed from the lockfile).
remotes::install_github(
"bioinfocz/scdrake@1.7.1",
dependencies = FALSE, upgrade = FALSE,
keep_source = TRUE
