Rnaquast

Quality assessment of de novo transcriptome assemblies from RNA-Seq data

Generate Convert Improve

Install / Use

/learn @ablab/Rnaquast

About this skill

Quality Score

0/100

README

rnaQUAST 2.3 manual

About rnaQUAST
Installation & requirements
2.1. General requirements
2.2. Software for de novo quality assessments
2.3. Read alignment software
Options
3.1. Input data options
3.2. Basic options
3.3. Advanced options
Understanding rnaQUAST output
4.1. Reports
4.2. Detailed output
4.3. Plots
Citation
Feedback and bug reports

1 About rnaQUAST

rnaQUAST is a tool for evaluating RNA-Seq assemblies using reference genome and gene database. In addition, rnaQUAST is also capable of estimating gene database coverage by raw reads and de novo quality assessment using third-party software.

rnaQUAST version 2.3.2 was released under GPLv2 on November 14th, 2025 and can be downloaded from https://github.com/ablab/rnaquast/releases.

There is also a visualizer software developed by one of rnaQUAST users @SimonHegele.

For impatient people:

You will need Python, gffutils, matplotlib and joblib. Also you will need GMAP (or BLAT) and BLASTN installed on your machine and added to the $PATH variable.
You may also install rnaQUAST via conda
```
 conda install -c bioconda rnaquast
```
To verify your installation run
```
 python rnaQUAST.py --test 
```

To run rnaQUAST on your data use the following command

 python rnaQUAST.py \
--transcripts /PATH/TO/transcripts1.fasta /PATH/TO/ANOTHER/transcripts2.fasta /PATH/TO/MULTIPLE/*.fasta [...] \
--reference /PATH/TO/reference_genome.fasta --gtf /PATH/TO/gene_coordinates.gtf

2 Installation & requirements

2.1 General requirements

rnaQUAST can be installed via conda:

    conda install -c bioconda rnaquast

If you wish to run rnaQUAST from the release archive you need:

Python3 or Python2 (2.5+)
matplotlib python package
joblib python package
gffutils python package (needs biopython)
NCBI BLAST+ (blastn)
GMAP (or BLAT) aligner

rnaQUAST still works under Python2 (2.5+), but since Python2 is outdated, its support is not maintained since version 2.0.

Note, that due to the limitations of BLAT, in order to work with reference genomes of size more than 4 Gb a pslSort is also required.

Paths to blastn and GMAP (or BLAT) should be added to the $PATH environmental variable. To check that everything is installed correctly we recommend to run:

python rnaQUAST.py --test

Note that gffutils is used to complete gene coordinates in case of missing transcripts / genes records. For more information, see advanced options.<a name="sec2.2"></a>

2.2 Software for de novo quality assessment

When reference genome and gene database are unavailable, we recommend to run BUSCO and GeneMarkS-T in rnaQUAST pipeline.

BUSCO requirements

BUSCO allows to detect core genes in the assembled transcripts. To use it you should install BUSCO v4+, tblastn, HMMER and transeq and add these tools to the $PATH variable.

To run BUSCO provide lineage-specific database name via --busco option. You may also download the appropriate database from http://busco.ezlab.org manually and provide it using the same option (see options for details).

GeneMarkS-T requirements

GeneMarkS-T allows to predict genes in the assembled transcripts without reference genome. If you wish to use it in rnaQUAST pipeline, GeneMarkS-T should be properly installed and added to the $PATH variable.

2.3 Read alignment software

rnaQUAST is also capable of calculating various statistics using raw reads (e.g. database coverage by reads). To obtain them you need to install STAR aligner and add it to the $PATH variable. To learn more see input options.

3 Options

3.1 Input data options

To run rnaQUAST you need to provide either FASTA files with transcripts (recommended), or align transcripts to the reference genome manually and provide the resulting PSL files.

-r <REFERENCE>, --reference <REFERENCE>
Single file with reference genome containing all chromosomes/scaffolds in FASTA format (preferably with *.fasta, *.fa, *.fna, *.ffn or *.frn extension) OR
*.txt file containing the one-per-line list of FASTA files with reference sequences.

--gtf <GENE_COORDINATES>
File with gene coordinates in GTF/GFF format (needs information about parent relations). We recommend to use files downloaded from GENCODE or Ensembl.

--gene_db <GENE_DB>
Path to the gene database generated by gffutils. The database is created during the first run. This option is not compatible with --gtf option. We recommend to use this option once the database is created in order to speed up the run.

--gmap_index <INDEX FOLDER>,
Folder containing pre-built GMAP index for the reference genome. Using previously constructed index decreases running time. Note, that you still need to provide the reference genome that was used for index construction when this option is used.

-c <TRANSCRIPTS ...>, --transcripts <TRANSCRIPTS, ...>
File(s) with transcripts in FASTA format separated by space. Wildcards can be used, e.g. --transcripts */*.fasta.

-psl <TRANSCRIPTS_ALIGNMENT ...>, --alignment <TRANSCRIPTS_ALIGNMENT, ...>
File(s) with transcript alignments to the reference genome in PSL format separated by space.

-sam <READS_ALIGNMENT>, --reads_alignment <READS_ALIGNMENT>
File with read alignments to the reference genome in SAM format.

-1 <LEFT_READS>, --left_reads <LEFT_READS>
File with forward paired-end reads in FASTQ or gzip-compressed fastq format.

-2 <RIGHT_READS>, --right_reads <RIGHT_READS>
File with reverse paired-end reads in FASTQ or gzip-compressed fastq format.

-s <SINGLE_READS>, --single_reads <SINGLE_READS>
File with single reads in FASTQ or gzip-compressed fastq format.

3.2 Basic options

-o <OUTPUT_DIR>, --output_dir <OUTPUT_DIR>
Directory to store all results. Default is rnaQUAST_results/results_<datetime>.

--test
Run rnaQUAST on the test data from the test_data folder, output directory is rnaOUAST_test_output.

-d, --debug
Report detailed information, typically used only for detecting problems.

-h, --help
Show help message and exit.

3.3 Advanced options

-t <INT>, --threads <INT>
Maximum number of threads. Default is min(number of CPUs / 2, 16).

-l <LABELS ...>, --labels <LABELS ...>
Name(s) of assemblies that will be used in the reports separated by space and given in the same order as files with transcripts / alignments.

--prokaryote
Use this option if the genome is prokaryotic.

-ss, --strand_specific
Set if transcripts were assembled using strand-specific RNA-Seq data in order to benefit from knowing whether the transcript originated from the + or - strand.

--min_alignment <MIN_ALIGNMENT>
Minimal alignment length to be used, default value is 50.

--no_plots
Do not draw plots (makes rnaQUAST run a bit faster).

--blat
Run with BLAT alignment tool instead of GMAP.

<a name="busco"></a> --busco
Run BUSCO tool, which detects core genes in the assembly (see Installation & requirements). Use this option to provide BUSCO database name to use or path to the local database. Also, you can set auto-lineage for automated lineage selection.

--gene_mark
Run with GeneMarkS-T gene prediction tool. Use --prokaryote option if the genome is prokaryotic.

--disable_infer_genes
Use this option if your GTF file already contains genes records, otherwise gffutils will fix it. Note that gffutils may work for quite a long time.

--disable_infer_transcripts
Is option if your GTF file already contains transcripts records, otherwise

Related Skills

node-connect

347.2k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

108.0k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

347.2k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

qqbot-media

347.2k

QQBot 富媒体收发能力。使用 <qqmedia> 标签，系统根据文件扩展名自动识别类型（图片/语音/视频/文件）。