ShortStack

Alignment of small RNA-seq data and annotation of small RNA-producing genes

Conda

Author

Michael J. Axtell, Penn State University, mja18@psu.edu

IMPORTANT CHANGE - Condensed Reads

As of version 4.1.0 ShortStack creates and uses "condensed reads" for alignments. This speeds things up and keeps file sizes smaller. But it requires some upgrades and some understanding. Ensure that strucVis version is >= 0.9 and ShortTracks version is >= 1.2. This is enforced by bioconda but users who manually install will need to upgrade. See Outputs and Overview of Methods for more about condensed reads.

Citations
Installation
Usage
Resources
Testing and Examples
Vignettes
Outputs
Visualizing Results
Overview of Methods
How to go FAST
ShortStack Version 4 Major Changes
Issues
FAQ

Citations

If you use ShortStack in support of your work, please cite one or more of the following:

Johnson NR, Yeoh JM, Coruh C, Axtell MJ. (2016). G3 6:2103-2111. doi:10.1534/g3.116.030452
Shahid S., Axtell MJ. (2013) Identification and annotation of small RNA genes using ShortStack. Methods doi:10.1016/j.ymeth.2013.10.004
Axtell MJ. (2013) ShortStack: Comprehensive annotation and quantification of small RNA genes. RNA 19:740-751. doi:10.1261/rna.035279.112

Installation

You can either use the conda package manager to install from the bioconda channel, or manually set up an environment. Use of conda/bioconda is highly recommended!

Install using conda (recommended)

First, install conda, and then set it up to use the bioconda channel following the instructions at https://bioconda.github.io

Then, follow instructions below based on your system to install the dependencies to a new environment and activate the environment.

Linux or Intel-Mac

conda create --name ShortStack4 shortstack 
conda activate ShortStack4

Silicon-Mac

Some dependencies have not been compiled for the newer Silicon-based Macs on bioconda, so we need to force conda to install the osx-64 (Intel) versions instead. Silicon Macs can run Intel code using built-in Rosetta translation.

conda create --name ShortStack4
conda activate ShortStack4
conda config --env --set subdir osx-64
conda install shortstack

Manual installation

Create an environment that contains the following packages / tools compiled and installed. Make note of required versions! (last updated for ShortStack release 4.1.0)

python >= 3.12.3 https://www.python.org
samtools >= 1.20 https://www.htslib.org
bowtie >= 1.3.1 https://bowtie-bio.sourceforge.net/index.shtml
viennarna 2.* https://www.tbi.univie.ac.at/RNA/documentation.html
tqdm https://tqdm.github.io
numpy https://numpy.org
biopython https://biopython.org
strucVis >= 0.9 https://github.com/MikeAxtell/strucVis
ShortTracks >= 1.2 https://github.com/MikeAxtell/ShortTracks
bedtools >= 2.31.1 https://bedtools.readthedocs.io/en/latest/
cutadapt >= 4.9 https://cutadapt.readthedocs.io/en/stable/

Then, download the ShortStack script from this github repo. Make it executable chmod +x ShortStack and then copy it into your environment's PATH.

Usage

ShortStack [-h] [--version] (--genomefile GENOMEFILE | --autotrim_only) [--known_miRNAs KNOWN_MIRNAS] (--readfile [READFILE ...] | --bamfile [BAMFILE ...]) [--outdir OUTDIR] [--adapter ADAPTER | --autotrim]
                  [--autotrim_key AUTOTRIM_KEY] [--threads THREADS] [--mmap {u,f,r}] [--align_only] [--dicermin DICERMIN] [--dicermax DICERMAX] [--locifile LOCIFILE | --locus LOCUS] [--nohp] [--dn_mirna]
                  [--strand_cutoff STRAND_CUTOFF] [--mincov MINCOV] [--pad PAD] [--make_bigwigs]

Required

(--genomefile GENOMEFILE | --autotrim_only) : Either --genomefile or --autotrim_only is required.
- --genomefile GENOMEFILE : Path to the reference genome in FASTA format. Must be indexable by both samtools faidx and bowtie-build, or already indexed.
- --autrotrim_only : If this switch is set, ShortStack quits after performing auto-trimming of input reads.
(--readfile [READFILE ...] | --bamfile [BAMFILE ...]) : Either --readfile or --bamfile is required.
- --readfile [READFILE ...] : Path(s) to one or more files of reads in fastq or fasta format. May be gzip compressed. Multiple files are separated by spaces. Inputting reads triggers alignments to be performed.
- --bamfile [BAMFILE ...] : Path(s) to one or more files of aligned sRNA-seq data in BAM format. Multiple files are separated by spaces. BAM files must match the reference genome given in --genomefile.

--known_miRNAs KNOWN_MIRNAS : Path to FASTA-formatted file of known mature miRNAs. FASTA must be formatted such that a single RNA sequence is on one line only. ATCGUatcgu characters are acceptable. These RNAs are typically the sequences of known microRNAs; for instance, a FASTA file of mature miRNAs pulled from https://www.mirbase.org. These known miRNA sequences are aligned to the genome and used to nucleate searches for loci that meet all expression-based and secondary structure-based requirements for MIRNA locus identification. See also option --dn_mirna.
--outdir OUTDIR : Specify the name of the directory that will be created for the results.
- default: ShortStack_[time], where [time] is the Unix time stamp according to the system when the run began.
--autotrim : This is strongly recommended when supplying untrimmed reads via --readfile. The autotrim method automatically infers the 3' adapter sequence of the untrimmed reads, and the uses that to coordinate read trimming. However, do not use --autotrim if your input reads have already been trimmed!
- Note: autotrim currently assumes your library strategy generated reads where nucleotide 1 of the read is the first biological / sRNA-derived nucleotide, and the 3' adapter starts immediately after the last sRNA nucleotide. It further assumes there are no random nucleotides (Ns) in the 3' adapter sequence. If your data do not meet these assumptions you cannot use --autotrim. Instead, remove your adapters by other appropriate methods and input the trimmed reads using --readfile without option --autotrim.
- Note: mutually exclusive with --adapter.
--threads THREADS : Set the number of threads to use. More threads = faster completion.
- default: 1

Other options

-h : Print a help message and then quit.
--version : Print the version and then quit.
--adapter ADAPTER : Manually specify a 3' adapter sequence to use during read trimming. Mutually exclusive with --autotrim. The --adapter option will apply the same adapter sequence to trim all given readfiles.
- Note: Use of --adapter is discouraged. In nearly all cases, --autotrim is a better bet for read trimming.
--autotrim_key AUTOTRIM_KEY : A DNA sequence to use as a known suffix during the --autotrim procedure. ShortStack's autotrim discovers the 3' adapter by scanning for reads that begin with the sequence given by AUTOTRIM_KEY. This should be the sequence of a small RNA that is known to be highly abundant in all of the libraries. The default sequence is for miR166, a microRNA that is present in nearly all plants at high levels. For non-plant experiments, or if the default is not working well, consider providing an alternative to the default.
- default: TCGGACCAGGCTTCATTCCCC (miR166)
--mmap {u,f,r} : Sets the mode by which multi-mapped reads are handled. These modes are described in Johnson et al. (2016). The default u mode has the best performance.
- u : (Default) Only uniquely-aligned reads are used as weights for placement of multi-mapped reads.
- f : Fractional weighting scheme for placement of multi-mapped reads.
- r : Multi-mapped read placement is random.
--align_only : This switch will cause ShortStack to terminate after the alignment phase; no analysis occurs.
--dicermin DICERMIN : An integer setting the minimum size (in nucleotides) of a valid small RNA. Together with --dicermax, this option sets the bounds to discriminate Dicer-derived small RNA loci from other loci. >= 80% of the reads in a given cluster must be in the range indicated by --dicermin and --dicermax.
- default: 21
--dicermax DICERMAX : An integer setting the minimum size (in nucleotides) of a valid small RNA. Together with --dicermin, this option sets the bounds to discriminate Dicer-derived small RNA loci from other loci. >= 80% of the reads in a given cluster must be in the range indicated by --dicermin and --dicermax.
- default: 24
--locifile LOCIFILE : Path to a file of pre-determined loci to analyze. This will prevent de novo discovery of small RNA loci. The file may be in gff3, bed, or simple tab-delimited format (Chr:Start-Stop[tab]Name). Mutually exclusive with --locus.
--locus LOCUS : A single locus to analyze, given as a string in the format Chr:Start-Stop (using one-based, inclusive numbering). This will prevent de novo discovery of small RNA loci. Mutually exclusive with --locifile.
--nohp : Switch that prevents search for microRNAs. This saves computational time, but MIRNA loci will not be differentiated from other types of small RNA clusters.
--dn_mirna : Switch that activates a de novo search

ShortStack

Install / Use

README

ShortStack

Author

IMPORTANT CHANGE - Condensed Reads

Table of Contents

Citations

Installation

Install using conda (recommended)

Linux or Intel-Mac

Silicon-Mac

Manual installation

Usage

Required

Recommended

Other options