PiPipes
piRNA pipeline collection developed in the Zamore Lab and ZLab in UMass Med School
Install / Use
/learn @bowhan/PiPipesREADME
A set of pipelines developed in the Zamore Lab and ZLab to analyze piRNA/transposon from different Next Generation Sequencing libraries (small RNA-seq, RNA-seq, Genome-seq, ChIP-seq, CAGE/Degradome-Seq).
In order to achieve a generic interface in terms of the genome assembles it supports, piPipes provides a installation pipeline to download ready-to-use genome annotation packages from Illumina iGenome as well as UCSC Genome Browser.
For small RNA-Seq, RNA-Seq and ChIP-Seq pipelines, piPipes provides two modes: single-sample mode and dual-sample mode, to analyze single library and pair-wise comparison between two samples respectively. For degradome-seq, piPipes provide options to perform Ping-Pong analysis between degradome reads and small RNA reads.
Visit our Wiki Page for more details on how to install the genome, run each pipeline, and interpretate the output.
##INSTALL
piPipes is written in Bash, C/C++, Perl, Python, HTML/Javascript and R. It currently only works under Linux environment.
C/C++
piPipes comes with statically compiled linux x86_64 binaries for its own C++ scripts and the other tools written in C/C++. Ideally, the users don't need to do any compiling.
But if the static versions do not work in your system, exemplified by the error message "kernel too old", please compile them from src and move the binaries to the bin, or simply email us or file an issue on Github.
If you need to compile from source code:
- Please install BEDtools using the source code in the
third_partydirectory and rename it asbedtools_piPipesin thebindirectory ofpiPipes. It has a little modification that makes our self-defined format more efficient to process. - Please install bowtie from https://github.com/bowhan/bowtie , where we have added native gzip/bzip2 support, which is required to run zipped, Paired-End sample for ChIP-seq pipeline.
- Most of piPipes's C++ code utilizes C++11 features and Boost library. It is recommended to install relatively new GCC and Boost for compiling them. If you don't have them, we recommend to use brew to install them automatically.
- Some codes require the htslib installed first.
Python/Cython
For MACS2 and HTSeq-count, the users will need to install them and make them available in their $PATH.
We cannot find a good way to ship the ready-to-use Cython code. Without htseq-count, piPipes rna/deg/cage won't be able to make transcripts/transposon counting using genomic coordinates. But it will still perform other functions of the pipeline, including quantification using Cufflinks and eXpress. Without macs2, piPipes chip/chip2 won't work at all.
R
For R packages that are unavailable in the user's system, the installation is performed during the piPipes install process. They will be installed in the same directory as the pipeline in case the user doesn't have write permission in the R installation directory. Please keep the version of R constant.
Genome Annotation
Due to the limitation on the size of the files on github, the genome sequence, most annotation files are to be downloaded from somewhere else and reformatted to accommodate the pipeline.
piPipes uses iGenome and provides piPipes install to download iGenome genomes and organize the files to be used by the pipeline (see below).
- For the recently released (07/2014) Drosophila melanogaster BDGP release 6, we directly obtain the data from flyBase;
piPipes uses the following public tools:
-
For alignment, piPipes uses Bowtie, Bowtie2, BWA, STAR and mrFast for different purposes.
-
For transcripts/transposons quantification, piPipes uses Cufflinks, HTSeq and eXpress under different circumstances.
-
For transposon mobilization as well as other structural variants discovery, piPipes uses TEMP, BreakDancer, RetroSeq and VariationHunter.
-
For ChIP-Seq reads allocation, piPipes uses CSEM; for peaks calling, piPipes uses MACS2. For TSS/TES/metagene analysis, piPipes uses bwtool.
-
Additionally, piPipes uses many tools from the Kent Tools, like
faSize,bedGraphToBigWig. -
To wrap bash scripts for multi-threading, piPipes utilizes
ParaFlyfrom Trinity. piPipes also learns thetouchtrick for job resuming from Trinity. -
To determine the version of FastQ, piPipes uses
SolexaQA.plfrom SolexaQA. piPipes have modified it in a way that the program exits as soon as the version of FastQ has been determined. The modified code can be found in thebindirectory. -
piPipes uses BEDtools to assign alignments to different genomic annotations (gene, transposon, piRNA cluster, et al.).
##USAGE
The pipeline finds almost everything under its own directory so please do not move the piPipes script. Use ln -s $ABSOLUTE_PATH_TO_piPipes/piPipes $HOME/bin/piPipes to create symbol link in your $HOME/bin; Or add /path/to/piPipes to your $PATH.
But please do NOT add the /path/to/piPipes/bin to your $PATH
Call different pipelines using:
# This is a very brief introduction, for more details on the usage and output interpretation, please visit our Wiki or the manual in the package
# ===== Genome installation pipeline =====
# 1. to install genome and R packages in one step
# the assembly that piPipes supports can be found in the common/iGenome_UTL.txt file
$PATH_TO_piPipes/piPipes install -g dm3|mm9|hg19...
# 2. to only download the genome and R packages (if the machine/node is not appropriate to be used for heavy computing tasks, like building indexes); then run (1) on a powerful mechine/node.
$PATH_TO_piPipes/piPipes install -g dm3|mm9|hg19 -D
# 3. to download the iGenome from other explicitly specified location
$PATH_TO_piPipes/piPipes install -g hg18 -l ftp://igenome:G3nom3s4u@ussd-ftp.illumina.com/Homo_sapiens/UCSC/hg18/Homo_sapiens_UCSC_hg18.tar.gz
# ===== Small RNA-seq pipeline =====
# to run small RNA pipeline in single sample mode; input fastq can be gzipped
$PATH_TO_piPipes/piPipes small -i input.trimmed.fq[.gz] -g dm3 -c 24
# to run small RNA pipeline in single sample mode; full options
$PATH_TO_piPipes/piPipes small -i input.trimmed.fq[.gz] -g dm3 -N miRNA -o output_dir -F virus.fa -P mini_white.fa -O gfp.fa
# to run small RNA pipeline in dual library mode (need single sample mode output for each sample first)
$PATH_TO_piPipes/piPipes small2 -a directory_A -b directory_B -g dm3 -c 24
# to run small RNA pipeline in dual library mode, normalized to miRNA, for unoxidized library
$PATH_TO_piPipes/piPipes small2 -a directory_A -b directory_B -g dm3 -c 24 -N miRNA
# to run small RNA pipeline in dual library mode, normalized to siRNA (structural loci and cis-NATs), for oxidation sample of -fruitfly only-
$PATH_TO_piPipes/piPipes small2 -a directory_A -b directory_B -g dm3 -c 24 -N siRNA
# ===== RNA-seq pipeline =====
# to run RNASeq pipeline in single sample mode, dUTP based method
$PATH_TO_piPipes/piPipes rnaseq -l left.fq -r right.fq -g mm9 -c 8 -o output_dir
# to run RNASeq pipeline in single sample mode, ligation based method
$PATH_TO_piPipes/piPipes rnaseq -l left.fq -r right.fq -g mm9 -c 8 -o output_dir -L
# to run RNASeq pipeline in dual library mode (need single sample mode been ran for each sample first)
$PATH_TO_piPipes/piPipes rnaseq2 -a directory_A -b directory_B -g mm9 -c 8 -o output_dir -A w1 -B piwi
# to run RNASeq pipeline in dual library mode with replicates
$PATH_TO_piPipes/piPipes rnaseq2 -a directory_A_rep1,directory_A_rep2,directory_A_rep3 -b directory_B_rep1,directory_B_rep2 -g mm9 -c 8 -o output_dir -A w1 -B piwi
# ===== Degradome/RACE/CAGE-seq pipeline =====
# to run Degradome/RACE/CAGE-Seq library
$PATH_TO_piPipes/piPipes deg -l left.fq -r right.fq -g dm3 -c 12 -o output_dir
# to run Degradome library to check ping-pong signature with a small RNA library (need the small RNA library ran first)
$PATH_TO_piPipes/piPipes deg -l left.fq -r right.fq -g dm3 -c 12 -o output_dir -s /path/to/small_RNA_library_output
# ===== ChIP-seq pipeline =====
# to run ChIP Seq library in single sample mode, for narrow peak, like transcriptional factor
$PATH_TO_piPipes/piPipes chip -l left.IP.fq -r right.IP.fq -L left.INPUT.fq -R right.INPUT.fq -g mm9 -c 8 -o output_dir
# to run ChIP Seq library in single sample mode, for broad peak, like H3K9me3
$PATH_TO_piPipes/piPipes chip -l left.IP.fq -r right.IP.fq -L left.INPUT.fq -R right.INPUT.fq -g mm9 -c 8 -o o
