DropseqRunner
Lightweight pipeline for processing droplet-based single-cell RNA-seq data
Install / Use
/learn @aselewa/DropseqRunnerREADME
TLDR
git clone git@github.com:aselewa/dropseqRunner.git
cd dropseqRunner
conda env create -f environment.yaml
conda activate dropRunner
STAR --runThreadN 4 --runMode genomeGenerate --genomeDir $OUTDIR --genomeFastaFiles $FASTA --sjdbGTFfile $ANNOTATION_GTF
python dropRunner.py --R1 path/to/{}.R1.fastq.gz \
--R2 path/to/{}.R2.fastq.gz \
--indices $OUTDIR \
--protocol drop/10x-v3/10x-v2
--sample pbmc_v3
If the above give you any trouble, run the demo to ensure everything is installed properly:
make run_test_workflow
Look for a message at the end that tells you whether the demo ran properly or not.
Getting started
dropRunner is a Snakemake-based pipeline for processing single-cell RNA-seq data from the Drop-seq and 10x platform. We utilize STARsolo for alignment and constructing the digital expression matrix. We also supply a detailed report in HTML format that shows the sequencing statistics, as well as read distribution across the genome. The pipeline only works on Linux systems (excluding Windows linux subsystem).
This pipeline is still under active development. If you have issues, please report them via GitHub.
Setting up conda
You may skip this if you already have conda installed and configured.
miniconda3 is a light version of Anaconda. To install on 64Linux, do the following:
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh
Once it is done, initialize conda so it is in your path:
conda init bash
source ~/.bashrc
Check conda works:
conda --version
0. Set up and activate environment
Use the provided environment.yaml file to set up the conda environment.
git clone git@github.com:aselewa/dropseqRunner.git
cd dropseqRunner
conda env create -f environment.yaml
This may take some time depending on your environment. A fresh conda installation should take about 10-15 minutes.
Activate the environment before running the next steps
conda activate dropRunner
1. Make reference genome indices
Use STAR to create indices for your reference genome of interest. You will need two things:
- fasta file of reference genome
- reference genome GTF annotations
You can get both of these from GENCODE for humans.
STAR --runThreadN 4 --runMode genomeGenerate --genomeDir $OUTDIR --genomeFastaFiles $FASTA --sjdbGTFfile $ANNOTATION_GTF
2. Run the pipeline
Use dropRunner.py on your fastq files to generate count matrices. Use the protocol parameter to specify drop, 10x-v2, or 10x-v3. The last two are version 2 and version 3 10x platforms.
python dropRunner.py --R1 path/to/{}.R1.fastq.gz
--R2 path/to/{}.R2.fastq.gz
--indices $OUTDIR
--cluster
--sample my_example_project
Note 1: You can supply multiple R1s and R2s by passing a comma-delimited list. I find this bash command useful:
R1=$(ls *.R1.fastq.gz | paste -sd,)
3. Output
There are two pieces of information that most users will need:
- html reports
- count matrices
The html report is in reports/. The count matrices are in output/{project_name}_Solo.out. There are two types of count matrices: filtered and raw. The raw matrix contains all valid barcodes, while filtered contains only barcodes with a certain number of UMI. This threshold is determined by STARsolo using a hueristic approach.
Please report any issues you run into via GitHub.
Related Skills
node-connect
343.1kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
90.0kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
343.1kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
343.1kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
