Popscle
A suite of population scale analysis tools for single-cell genomics data including implementation of Demuxlet / Freemuxlet methods and auxilary tools
Install / Use
/learn @statgen/PopscleREADME
popscle
popscle is a suite of population scale analysis tools for single-cell genomics data. The key software tools in this repository includes demuxlet (version 2) and freemuxlet, a genotyping-free method to deconvolute barcoded cells by their identities while detecting doublets.
Quick Overview
With popscle, we recommend analyzing single cell RNA-seq (and other single cell genomic) dataset in two steps.
- Use
dsc-pileupto generate pileups around known variants from aligned sequence reads. - Use
demuxlet(with genotypes) orfreemuxlet(without genotypes) to deconvolute the identities of barcoded cells.
Read the tutorial at https://github.com/statgen/popscle/wiki , if you would like to learn how to run software tools in popscle by example.
Read the documentation below if you want a comprehensive documentation about these tools.
Introduction
Overview
demuxlet and freemuxlet are two software tools to deconvolute sample identity and identify multiplets when multiple samples are pooled by barcoded single cell sequencing. If external genotyping data for each sample is available (e.g. from SNP arrays), demuxlet would be recommended. On the other hand, if external genotyping data is not available, the genotyping-free version demuxlet, freemuxlet, would be recommended. You still need variant site list (in VCF) even if you intend to use freemuxlet in order to generate pileups.
You need to run dsc-pileup before running demuxlet and freemuxlet. dsc-pileup is a software tool to pileup reads and corresponding base quality for each overlapping SNPs and each barcode. By using pileup files, it would allow us to run demuxlet/freemuxlet pretty fast multiple times without going over the BAM file again.
dsc-pileup requires the following input files:
- a SAM/BAM/CRAM file produced by the standard 10x sequencing platform, or any other barcoded single cell RNA-seq (with proper
--tag-UMIand--tag-group) options. - A VCF/BCF files containing (AC) and (AN) from referenced population (e.g. 1000g).
demuxlet require the following input files:
- Pileup files (CEL,VAR and PLP) produced by
dsc-pileup. - a VCF/BCF file containing the genotype (GT), posterior probability (GP), or genotype likelihood (GL) to assign each barcode to a specific sample (or a pair of samples) in the VCF file.
Alternatively, demuxlet could also directly take SAM file without running dsc-pileup. In this case, demuxlet would require the following files:
- a SAM/BAM/CRAM file produced by the standard 10x sequencing platform, or any other barcoded single cell RNA-seq (with proper
--tag-UMIand--tag-group) options. - a VCF/BCF file containing the genotype (GT), posterior probability (GP), or genotype likelihood (GL) to assign each barcode to a specific sample (or a pair of samples) in the VCF file.
freemuxlet require the following input:
- Pileup files (CEL, PLP and VAR) from dsc-pileup
- Number of samples
Tips for running
- If external reference sequence vcf file is available, demuxlet is recommended
- Default setting alpha as 0.5, which assumes the expected proportion of 50% genetic mixture from two individuals, to get better estimates of doublets.
- Set
--group-listto a list of barcodes (i.e. barcodes.tsv from 10X) indsc-pileupto speed things up and only get demultiplexing for cells called by other methods. - To reproduce the results presented in Figure 2 of the demuxlet paper, please use the original version of demuxlet, with the data downloadable at https://github.com/yelabucsf/demuxlet_paper_code/tree/master/fig2 . If you want to learn how to perform similar analysis with
popscle, please go to https://github.com/statgen/popscle/wiki . - Check tutorial README.md for more detailed tutorial with example data
- If you start process in docker, use cmdline
docker run <imagename> "<popscle-arguments>"(e.g.docker run popscle "freemuxlet") to run docker tasks.
Installing demuxlet/freemuxlet
<pre> $ mkdir build $ cd build $ cmake .. </pre>In case any required libraries is missing, you may specify customized installing path by replacing "cmake .." with:
<pre> For libhts: - $ cmake -DHTS_INCLUDE_DIRS=/hts_absolute_path/include/ -DHTS_LIBRARIES=/hts_absolute_path/lib/libhts.a .. For bzip2: - $ cmake -DBZIP2_INCLUDE_DIRS=/bzip2_absolute_path/include/ -DBZIP2_LIBRARIES=/bzip2_absolute_path/lib/libbz2.a .. For lzma: - $ cmake -DLZMA_INCLUDE_DIRS=/lzma_absolute_path/include/ -DLZMA_LIBRARIES=/lzma_absolute_path/lib/liblzma.a .. </pre>Finally, to build the binary, run
<pre> $ make </pre>Using demuxlet and freemuxlet
All softwares use a self-documentation utility. You can run each utility with -man or -help option to see the command line usages. Also, we offer some general practice with an example in tutorial (data is available here: https://drive.google.com/drive/folders/1wfnn132vMbZhicpWOZVbR_36YpIiojug?usp=sharing).
demuxlet
<pre> $(POPSCLE_HOME)/bin/popscle dsc-pileup --sam /data/$bam --vcf /data/$ref_vcf --out /data/$pileup $(POPSCLE_HOME)/bin/popscle demuxlet --plp /data/$pileup --vcf /data/$external_vcf --field $(GT or GP or PL) --out /data/$filename </pre>Or, demuxlet could directly take SAM file as input:
<pre> $(POPSCLE_HOME)/bin/popscle demuxlet --sam /data/$sam --vcf /data/$external_vcf --field $(GT or GP or PL) --out /data/$filename </pre>freemuxlet
<pre> $(POPSCLE_HOME)/bin/popscle dsc-pileup --sam /data/$bam --vcf /data/$ref_vcf --out /data/$pileup $(POPSCLE_HOME)/bin/popscle freemuxlet --plp /data/$pileup --out /data/$filename --nsample $n </pre>The detailed usage is also pasted below.
