Drop
Pipeline to find aberrant events in RNA-Seq data, useful for diagnosis of rare disorders
Install / Use
/learn @gagneurlab/DropREADME
Detection of RNA Outliers Pipeline
The detection of RNA Outliers Pipeline (DROP) is an integrative workflow to detect aberrant expression, aberrant splicing, and mono-allelic expression from raw RNA sequencing data.
The manuscript is available in Nature Protocols.
This website contains the different reports of the Geuvadis demo dataset described in the paper.
This video presents the tools used in DROP and their application to rare disease diagnostics.
<img src="drop_sticker.png" alt="drop logo" width="200" class="center"/>Quickstart
DROP is available on bioconda.
We recommend using a dedicated conda environment (drop_env in this example). Installation time: ~ 15 min.
mamba create -n drop_env -c conda-forge -c bioconda drop --override-channels
In the case of troubles with mamba or conda, we recommend using the fixed DROP_<version>.yaml installation file we make available on our public server. Install the current version and use the full path in the following command to install the conda environment drop_env
mamba env create -f DROP_1.4.0.yaml
Test installation with demo project
conda activate drop_env
mkdir ~/drop_demo
cd ~/drop_demo
drop demo
The pipeline can be run using snakemake commands
snakemake -n # dryrun
snakemake --cores 1
Expected runtime: 25 min
For more information on different installation options, refer to the documentation
What's new
Version 1.6.1 fixes issues due to the new way to import functions in txdbmaker and GenomeInfoDb with the new BioC release 3.22 and above.
Version 1.6.0 contains a fix to a bug in the counting of the aberrant expression module ⚠️ . In addition, it contains a fix to the assignment of variants to genes in the MAE module. Please do not use v1.5.0.
Version 1.5.0 contains the versions of OUTRIDER and FRASER that use the Optimal Hard Threshold procedure established by Gavish and Donoho, a deterministic approach to denoise low-rank matrices relying on singular value decomposition, to find the optimal autoencoder dimension instead of the grid search. This leads to a reduction in the run time of around 6-10 times. DROP supports mixing BAM files and external expression counts within the same group. This allows users to provide pre-computed expression counts for some samples while still using BAM files for splicing analysis. When both are available for a sample, external counts take priority for expression analysis, while BAM files are still used for splicing analysis.
⚠️ Also, since this version, OUTRIDER and FRASER are released under CC-BY-NC 4.0, meaning a license is required for any commercial use. The commercial license is distributed by OmicsDiscoveries. If you intend to use it for commercial purposes, please write to license@omicsdiscoveries.com. For the purposes of this license, commercial use includes any situation in which an entity receives payment from another entity for running DROP, regardless of the specific application.
Version 1.4.0 fixes some bugs regarding the split reads counting for FRASER, which only affects cohorts containing samples with different strands (stranded forward, stranded reverse or unstranded). If your cohort contained samples with different strands, please rerun the AS module using version 1.4.0. In addition, due to snakemake updates affecting wBuild and the way we installed FRASER, installing DROP 1.3.3 no longer works. Version 1.3.4 fixes the FRASER version to ensure reproducibility and fixes certain scripts affected by the snakemake update. Running the pipeline with version 1.3.4 should provide the same outlier results as 1.3.3.
Version 1.3.0 introduces the option to use FRASER 2.0, which is an improved version of FRASER that uses the Intron Jaccard Index metric instead of the percent spliced in (PSI) and splicing efficiency to quantify and later call aberrant splicing. To run FRASER 2.0, modify the FRASER_version parameter in the aberrantSplicing dictionary in the config file and adapt the quantileForFiltering and deltaPsiCutoff parameters. See the config template for more details. When switching between FRASER versions, we recommend running DROP in a
separate folder for each version. Moreover, DROP now allows users to provide lists of genes to focus on and do the multiple testing correction instead of the usual transcriptome-wide approach. Refer to the documentation.
Snakemake v.7.8 introduced some changes in which changes in parameters can cause rules to be re-executed. More info here. This affects DROP and causes certain rules in the AS and QC modules to be triggered even if they were already completed and there were no changes in the sample annotation or scripts. The workaround is to run DROP by adding the parameter --rerun-triggers mtime, e.g. snakemake -n --rerun-triggers mtime or snakemake --cores 10 --rerun-triggers mtime. We will investigate the rules in DROP to fix this.
As of version 1.2.1 DROP has a new module that performs RNA-seq variant calling. The input are BAM files and the output either a single-sample or a multi-sample VCF file (option specified by the user) annotated with allele frequencies from gnomAD (if specified by the user). The sample annotation table does not need to be changed, but several new parameters in the config file have to be added and tuned. For more info, refer to the documentation. Also, as of version 1.2.1 the integration of external split and non-split counts to detect aberrant splicing is now possible. In a new column in the sample annotation, simply specify the directory containing the counts. For more info, refer to the documentation.
Set up a custom project
Install the drop module according to installation and initialize the project in a custom project directory.
Prepare the input data
Create a sample annotation table that contains the sample IDs, file locations and other information necessary for the pipeline. Edit the config file to set the correct file path of the sample annotation and locations of non-sample-specific input files. The requirements are described in the documentation.
Execute the pipeline
Once these files are set up, you can execute a dry run from your project directory
snakemake -n
This shows you the rules of all subworkflows. Omit -n and specify the number of cores with --cores if you are sure that you want you execute all printed rules. You can also invoke single workflows explicitly e.g. for aberrant expression with:
snakemake aberrantExpression --cores 10
Datasets
The following publicly available datasets of gene counts can be used as controls. Please cite as instructed for each dataset.
-
154 non strand-specific fibroblasts, build hg19, Technical University of Munich:
-
135 strand-specific fibroblasts, build hg19, high seq depth (116 million mapped reads), Technical University of Munich:
-
127 strand-specific fibroblasts, build hg19, low seq depth (70 million mapped reads), Technical University of Munich:
-
49 tissues, each containing hundreds of samples, non strand-specific, build hg19, GTEx:
-
49 tissues, each containing hundreds of samples, non strand-specific, build hg38, GTEx:
-
139 strand-specific fibroblasts, build hg19, Baylor College of Medicine:
-
125 strand-specific blood, build hg19, Baylor College of Medicine:
-
330 strand-specific induced pluripotent stem cells (iPSCs), build hg19, EMBL:
-
56 non strand-specific amniotic fluid cells, build hg19, The University of Hong Kong:
If you want to contribute with your own count
Related Skills
node-connect
337.3kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
83.2kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
337.3kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
commit-push-pr
83.2kCommit, push, and open a PR
