ShortStack
ShortStack: Comprehensive annotation and quantification of small RNA genes
Install / Use
/learn @MikeAxtell/ShortStackREADME
ShortStack
Alignment of small RNA-seq data and annotation of small RNA-producing genes
Author
Michael J. Axtell, Penn State University, mja18@psu.edu
IMPORTANT CHANGE - Condensed Reads
As of version 4.1.0 ShortStack creates and uses "condensed reads" for alignments. This speeds things up and keeps file sizes smaller. But it requires some upgrades and some understanding. Ensure that strucVis version is >= 0.9 and ShortTracks version is >= 1.2. This is enforced by bioconda but users who manually install will need to upgrade. See Outputs and Overview of Methods for more about condensed reads.
Table of Contents
- Citations
- Installation
- Usage
- Resources
- Testing and Examples
- Vignettes
- Outputs
- Visualizing Results
- Overview of Methods
- How to go FAST
- ShortStack Version 4 Major Changes
- Issues
- FAQ
Citations
If you use ShortStack in support of your work, please cite one or more of the following:
- Johnson NR, Yeoh JM, Coruh C, Axtell MJ. (2016). G3 6:2103-2111. doi:10.1534/g3.116.030452
- Shahid S., Axtell MJ. (2013) Identification and annotation of small RNA genes using ShortStack. Methods doi:10.1016/j.ymeth.2013.10.004
- Axtell MJ. (2013) ShortStack: Comprehensive annotation and quantification of small RNA genes. RNA 19:740-751. doi:10.1261/rna.035279.112
Installation
You can either use the conda package manager to install from the bioconda channel, or manually set up an environment. Use of conda/bioconda is highly recommended!
Install using conda (recommended)
First, install conda, and then set it up to use the bioconda channel following the instructions at https://bioconda.github.io
Then, follow instructions below based on your system to install the dependencies to a new environment and activate the environment.
Linux or Intel-Mac
conda create --name ShortStack4 shortstack
conda activate ShortStack4
Silicon-Mac
Some dependencies have not been compiled for the newer Silicon-based Macs on bioconda, so we need to force conda to install the osx-64 (Intel) versions instead. Silicon Macs can run Intel code using built-in Rosetta translation.
conda create --name ShortStack4
conda activate ShortStack4
conda config --env --set subdir osx-64
conda install shortstack
Manual installation
Create an environment that contains the following packages / tools compiled and installed. Make note of required versions! (last updated for ShortStack release 4.1.0)
python>= 3.12.3 https://www.python.orgsamtools>= 1.20 https://www.htslib.orgbowtie>= 1.3.1 https://bowtie-bio.sourceforge.net/index.shtml- viennarna 2.* https://www.tbi.univie.ac.at/RNA/documentation.html
tqdmhttps://tqdm.github.ionumpyhttps://numpy.org- biopython https://biopython.org
strucVis>= 0.9 https://github.com/MikeAxtell/strucVisShortTracks>= 1.2 https://github.com/MikeAxtell/ShortTracksbedtools>= 2.31.1 https://bedtools.readthedocs.io/en/latest/cutadapt>= 4.9 https://cutadapt.readthedocs.io/en/stable/
Then, download the ShortStack script from this github repo. Make it executable chmod +x ShortStack and then copy it into your environment's PATH.
Usage
ShortStack [-h] [--version] (--genomefile GENOMEFILE | --autotrim_only) [--known_miRNAs KNOWN_MIRNAS] (--readfile [READFILE ...] | --bamfile [BAMFILE ...]) [--outdir OUTDIR] [--adapter ADAPTER | --autotrim]
[--autotrim_key AUTOTRIM_KEY] [--threads THREADS] [--mmap {u,f,r}] [--align_only] [--dicermin DICERMIN] [--dicermax DICERMAX] [--locifile LOCIFILE | --locus LOCUS] [--nohp] [--dn_mirna]
[--strand_cutoff STRAND_CUTOFF] [--mincov MINCOV] [--pad PAD] [--make_bigwigs]
Required
(--genomefile GENOMEFILE | --autotrim_only): Either--genomefileor--autotrim_onlyis required.--genomefile GENOMEFILE: Path to the reference genome in FASTA format. Must be indexable by bothsamtools faidxandbowtie-build, or already indexed.--autrotrim_only: If this switch is set, ShortStack quits after performing auto-trimming of input reads.
(--readfile [READFILE ...] | --bamfile [BAMFILE ...]): Either--readfileor--bamfileis required.--readfile [READFILE ...]: Path(s) to one or more files of reads infastqorfastaformat. May begzipcompressed. Multiple files are separated by spaces. Inputting reads triggers alignments to be performed.--bamfile [BAMFILE ...]: Path(s) to one or more files of aligned sRNA-seq data in BAM format. Multiple files are separated by spaces. BAM files must match the reference genome given in--genomefile.
Recommended
--known_miRNAs KNOWN_MIRNAS: Path to FASTA-formatted file of known mature miRNAs. FASTA must be formatted such that a single RNA sequence is on one line only. ATCGUatcgu characters are acceptable. These RNAs are typically the sequences of known microRNAs; for instance, a FASTA file of mature miRNAs pulled from https://www.mirbase.org. These known miRNA sequences are aligned to the genome and used to nucleate searches for loci that meet all expression-based and secondary structure-based requirements for MIRNA locus identification. See also option--dn_mirna.--outdir OUTDIR: Specify the name of the directory that will be created for the results.- default:
ShortStack_[time], where[time]is the Unix time stamp according to the system when the run began.
- default:
--autotrim: This is strongly recommended when supplying untrimmed reads via--readfile. Theautotrimmethod automatically infers the 3' adapter sequence of the untrimmed reads, and the uses that to coordinate read trimming. However, do not use--autotrimif your input reads have already been trimmed!- Note:
autotrimcurrently assumes your library strategy generated reads where nucleotide 1 of the read is the first biological / sRNA-derived nucleotide, and the 3' adapter starts immediately after the last sRNA nucleotide. It further assumes there are no random nucleotides (Ns) in the 3' adapter sequence. If your data do not meet these assumptions you cannot use--autotrim. Instead, remove your adapters by other appropriate methods and input the trimmed reads using--readfilewithout option--autotrim. - Note: mutually exclusive with
--adapter.
- Note:
--threads THREADS: Set the number of threads to use. More threads = faster completion.- default: 1
Other options
-h: Print a help message and then quit.--version: Print the version and then quit.--adapter ADAPTER: Manually specify a 3' adapter sequence to use during read trimming. Mutually exclusive with--autotrim. The--adapteroption will apply the same adapter sequence to trim all given readfiles.- Note: Use of
--adapteris discouraged. In nearly all cases,--autotrimis a better bet for read trimming.
- Note: Use of
--autotrim_key AUTOTRIM_KEY: A DNA sequence to use as a known suffix during the--autotrimprocedure. ShortStack's autotrim discovers the 3' adapter by scanning for reads that begin with the sequence given byAUTOTRIM_KEY. This should be the sequence of a small RNA that is known to be highly abundant in all of the libraries. The default sequence is for miR166, a microRNA that is present in nearly all plants at high levels. For non-plant experiments, or if the default is not working well, consider providing an alternative to the default.- default:
TCGGACCAGGCTTCATTCCCC(miR166)
- default:
--mmap {u,f,r}: Sets the mode by which multi-mapped reads are handled. These modes are described in Johnson et al. (2016). The defaultumode has the best performance.u: (Default) Only uniquely-aligned reads are used as weights for placement of multi-mapped reads.f: Fractional weighting scheme for placement of multi-mapped reads.r: Multi-mapped read placement is random.
--align_only: This switch will cause ShortStack to terminate after the alignment phase; no analysis occurs.--dicermin DICERMIN: An integer setting the minimum size (in nucleotides) of a valid small RNA. Together with--dicermax, this option sets the bounds to discriminate Dicer-derived small RNA loci from other loci. >= 80% of the reads in a given cluster must be in the range indicated by--dicerminand--dicermax.- default: 21
--dicermax DICERMAX: An integer setting the minimum size (in nucleotides) of a valid small RNA. Together with--dicermin, this option sets the bounds to discriminate Dicer-derived small RNA loci from other loci. >= 80% of the reads in a given cluster must be in the range indicated by--dicerminand--dicermax.- default: 24
--locifile LOCIFILE: Path to a file of pre-determined loci to analyze. This will prevent de novo discovery of small RNA loci. The file may be in gff3, bed, or simple tab-delimited format (Chr:Start-Stop[tab]Name). Mutually exclusive with--locus.--locus LOCUS: A single locus to analyze, given as a string in the format Chr:Start-Stop (using one-based, inclusive numbering). This will prevent de novo discovery of small RNA loci. Mutually exclusive with--locifile.--nohp: Switch that prevents search for microRNAs. This saves computational time, but MIRNA loci will not be differentiated from other types of small RNA clusters.--dn_mirna: Switch that activates a de novo search
