Stringtie
Transcript assembly and quantification for RNA-Seq
Install / Use
/learn @gpertea/StringtieREADME
StringTie: efficient transcript assembly and quantitation of RNA-Seq data
Stringtie employs efficient algorithms for transcript structure recovery and abundance estimation from bulk RNA-Seq reads aligned to a reference genome. It takes as input spliced alignments in coordinate-sorted SAM/BAM/CRAM format and produces a GTF output which consists of assembled transcript structures and their estimated expression levels (FPKM/TPM and base coverage values).
For additional StringTie documentation and the latest official source and binary packages please refer to the official website: https://ccb.jhu.edu/software/stringtie
Obtaining and installing StringTie
Source and binary packages for this software can be directly downloaded from the Releases page for this repository. StringTie is compatible with a wide range of Linux and Apple OS systems. The main program (StringTie) does not have any other library dependencies (besides zlib) and in order to compile it from source it requires a C++ compiler which supports the C++ 11 standard (GCC 4.8 or newer).
Building the latest version from the repository
In order to compile the StringTie source in this GitHub repository the following steps can be taken:
git clone https://github.com/gpertea/stringtie
cd stringtie
make -j4 release
To build with an alternate compiler set the CC and CXX environment variables, for example:
CC=clang CXX=clang++ make -j4 release
CC=icx CXX=icpx make -j4 release
During the first run of the above make command a few library dependencies will be downloaded and compiled, but any subsequent stringtie updates (using git pull)
should rebuild much faster.
To complete the installation, the resulting stringtie binary can then be copied to a programs directory of choice (preferably one that is in the current shell's PATH).
Building and installing of StringTie this way should take less than a minute on a regular Linux or Apple MacOS desktop.
Note that simply running make would produce a less optimized executable which is suitable for debugging
and runtime checking but that is significantly slower than the optimized version which
is built by using the make release command as instructed above.
Building and testing the offline source package
For HPC environments that do not have online access during the build, please download the latest .offline.tar.gz package from <a href="https://github.com/gpertea/stringtie/releases">Releases</a>. Unpack it and cd in the unpacked directory, then:
make -j6 release
make test
This should build the stringtie binary and run the included tests, without having to fetch dependencies and test data.
Using pre-compiled (binary) releases
Instead of compiling from source, some users may prefer to download an already compiled binary for Linux and Apple MacOS, ready to run. These binary package releases are compiled on older versions of these operating systems in order to provide compatibility with a wide range of OS versions not just the most recent distributions. These precompiled packages are made available on the <a href="https://github.com/gpertea/stringtie/releases">Releases</a> page for this repository. Please note that these binary packages do not include the optional super-reads module, which currently can only be built on Linux machines from the source made available in this repository.
Running StringTie
The generic command line for the default usage has this format:
stringtie [-o <output.gtf>] [other_options] <read_alignments.bam>
The main input of the program (<read_alignments.bam>) must be a SAM, BAM or CRAM file with RNA-Seq read
alignments sorted by their genomic location (for example the accepted_hits.bam file produced
by TopHat, or HISAT2 output sorted with samtools sort etc.).
The main output is a GTF file containing the structural definitions of the transcripts assembled by StringTie from the read alignment data. The name of the output file should be specified with the -o option. If this -o option is not used, the output GTF with the assembled transcripts will be printed to the standard
output (and can be captured into a file using the > output redirect operator).
Note: if the --mix option is used, StringTie expects two alignment files to be given as positional parameters, in a specific order: the short read alignments must be the first file given while the long read alignments must be the second input file. Both alignment files must be sorted by genomic location.
stringtie [-o <output.gtf>] --mix [other_options] <short_read_alns.bam> <long_read_alns.bam>
Note that the command line parser in StringTie allows arbitrary order and mixing of the positional parameters with the other options of the program, so the input alignment files can also precede or be given in between the other options -- the following command line is equivalent to the one above:
stringtie <short_read_alns.bam> <long_read_alns.bam> --mix [other_options] [-o <output.gtf>]
Nascent-aware assembly (new in StringTie3)
Many rRNA-depleted (“Total RNA”) libraries capture a mixture of mature and nascent (incomplete) RNA. StringTie 3 introduces a nascent-aware algorithm that accounts for this signal and separates mature from nascent RNA in the assembly process.
| Flag | Behaviour |
| :--- | :--- |
| -N | Enables nascent-aware assembly (recommended for any rRNA-depleted or Total RNA library). The primary output GTF contains only mature transcript models. |
| --nasc | Same algorithm as -N, but also retains nascent intermediates in the GTF output. Nascent records carry nascentRNA in the source column (field 2) and include a nascent_parent "<mature_ID>" attribute that links each nascent transcript to its mature parent. |
Full algorithmic details are described in our pre-print:
Shinder I, Pertea G, Hu R, Rudnick Z, Pertea M. StringTie 3 improves total-RNA assembly by resolving nascent and mature transcripts. bioRxiv (2025). doi:10.1101/2025.05.21.655404
Running StringTie on the provided test data
When building from this source repository, after the program was compiled with make release as instructed above, the generated binary can be tested on a small data set with a command like this:
make test
This will run the included run_tests.sh script which downloads a small test data set
and runs a few simple tests to ensure that the program works and generates the expected output.
If a pre-compiled package is used instead of compiling the program from source, the run_tests.sh script is included in the binary package as well and it can be run immediately after unpacking the binary package:
tar -xvzf stringtie-3.0.0.Linux_x86_64.tar.gz
cd stringtie-3.0.0.Linux_x86_64
./run_tests.sh
These small test/demo data set can be downloaded as <a href="https://github.com/gpertea/stringtie/raw/test_data/tests_3.tar.gz">tests_3.tar.gz</a> along with the source package and pre-compiled packages on the <a href="https://github.com/gpertea/stringtie/releases">Releases</a> page of this repository.
The tests can also be run manually as shown below (after changing to the test_data directory, cd test_data):
Test 1: Input consists of only alignments of short reads
stringtie -o short_reads.out.gtf short_reads.bam
Test 2: Input consists of alignments of short reads and superreads
stringtie -o short_reads_and_superreads.out.gtf short_reads_and_superreads.bam
Test 3: Input consists of alignments of long reads
stringtie -L -o long_reads.out.gtf long_reads.bam
Test 4: Input consists of alignments of long reads and reference annotation (guides)
stringtie -L -G human-chr19_P.gff -o long_reads_guided.out.gtf long_reads.bam
Test 5: Input consists of alignments of short reads and alignments of long reads (using --mix option)
stringtie --mix -o mix_reads.out.gtf mix_short.bam mix_long.bam
Test 6: Input consists of alignments of short reads, alignments of long reads and a reference annotation (guides)
stringtie --mix -G mix_guides.gff -o mix_reads_guided.out.gtf mix_short.bam mix_long.bam
For version 3.0.0, three additional tests have been added, please see the run_tests.sh scripts for the details.
For large data sets one can expect up to one hour of processing time. A minimum of 8GB of RAM is recommended for running StringTie on regular size RNA-Seq samples, with 16 GB or more being strongly advised for larger data sets.
StringTie options
The following optional parameters can be specified (use -h or --help to get the usage message):
--version : print just the version at stdout and exit
--conservative : conservative transcript assembly, same as -t -c 1.5 -f 0.05
--mix : both short and long read data alignments are provided
(long read alignments must be the 2nd BAM/CRAM input file)
--rf : assume stranded library fr-firststrand
--fr : assume stranded library fr-secondstrand
-G reference annotation to use for guiding the assembly process (GTF/GFF)
--ptf : load point-features from a given 4 column feature file <f_tab>
-o output path/file name for the assembled transcripts GTF (default: stdout)
-l name prefix for output transcripts (default: STRG)
-f minimum isoform fraction (default: 0.01)
-L long reads processing; also enforces -s 1.5 -g 0 (default:false)
-R if long reads are provided, just clean and collapse the reads but
do not assemble
-m minimum assembled transcript length (default: 200)
-a minimum anchor length for junctions (default: 10)
-j minimum junction coverage (default: 1)
-t disable trimming of predicted transcripts based on coverage
Related Skills
node-connect
332.0kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
81.7kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
332.0kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
commit-push-pr
81.7kCommit, push, and open a PR
