SkillAgentSearch skills...

Vadr

Viral Annotation DefineR: classification and annotation of viral sequences based on RefSeq annotation

Install / Use

/learn @NLM-DIR/Vadr
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

VADR - Viral Annotation DefineR <a name="top"></a>

Version 1.7; September 2025

https://github.com/ncbi/vadr.git

VADR is a suite of tools for classifying and analyzing sequences homologous to a set of reference models of viral genomes or gene families. It includes models that can be used to validate and annotate Norovirus, Dengue virus, SARS-CoV-2 virus as well as other flaviviruses, caliciviruses, and coronaviruses, plus influenza virus, mpox virus, and respiratory syncitial virus (RSV). Additional models are available to download or can be created using the v-build.pl program.


Quick-start: install VADR and classify and annotate viral sequences using v-scan.pl<a name="quickstart"></a>

Install VADR:

Download this file:

https://raw.githubusercontent.com/ncbi/vadr/master/vadr-install.sh

possibly with a command like:

curl -o vadr-install.sh https://raw.githubusercontent.com/ncbi/vadr/master/vadr-install.sh

And execute it, with one of the following commands depending on your system type:

sh ./vadr-install.sh linux

OR

sh ./vadr-install.sh macosx-silicon

OR

sh ./vadr-install.sh macosx-intel

Then follow the instructions output at the end of the installation for updating your .bashrc or .cshrc file and defining important environment variables that VADR relies on.

Run v-scan.pl to annotate viral sequences

Given a fasta sequence file called my.fa with any combination of flavivirus, calicivirus, coronavirus, influenza, RSV, or Mpox sequences, run:

v-scan.pl -m in.fa out

This will list each stage of the processing and ultimately create an output directory called out and fill it with output files. Short descriptions of the output files will be printed to the screen. More detailed explanation of output file types can be found here. For a more detailed walk-through example of v-scan.pl see this page.


VADR programs

The VADR v-scan.pl script classifies and annotates sequences that match to any of your VADR model libraries. Once v-scan.pl determines the library to use for a given set of sequences, it runs a different VADR program called v-annotate.pl which identifies the appropriate model in the library to use for each sequence and defines the annotation based on that most similar model. v-scan.pl will automatically run v-annotate.pl using the recommended settings (v-annotate.pl command-line options) for each library but alternatively, users can run the v-annotate.pl separately. Example usage of v-annotate.pl can be found here.

Another VADR script, v-build.pl, is used to create the models from individual sequences from GenBank or from input multiple sequence alignments, potentially with secondary structure annotation. v-build.pl stores the GenBank feature annotation in the model, and v-annotate.pl maps that annotation (e.g. CDS coordinates) onto the sequences it annotates. Example usage of v-build.pl can be found here. An advanced tutorial on building VADR models using RSV as an example can be found here.

v-annotate.pl identifies unexpected or divergent attributes of the sequences it annotates (e.g. invalid or early stop codons in CDS features) and reports them to the user in the form of alerts. A subset of alerts are fatal and cause a sequence to fail. A sequence passes if zero fatal alerts are reported for it. VADR is used by GenBank staff to evaluate incoming sequence submissions of some viruses (currently Norovirus, Dengue virus, and SARS-CoV-2). Submitted Norovirus, Dengue virus and SARS-CoV-2 sequences that pass v-annotate.pl are accepted into GenBank.

The homology search and alignment components of VADR scripts, the most computationally expensive steps, are performed by the Infernal, HMMER, FASTA, MINIMAP2 and BLAST software packages, which are downloaded and installed with VADR installation.


VADR model libraries <a name="models"></a>

VADR installation includes the following model libraries:

| library | model key (short name) | rigorously tested? | number of models | notes | |--------------|------------------------|--------------------|------------------|-------| | Caliciviridae | calici | norovirus models only | 49 | norovirus models used by GenBank | | Flaviviridae | flavi | dengue and HCV models only | 156 | dengue models used by GenBank | | Coronaviridae | corona | SARS-CoV-2 only | 55 | SARS-CoV-2 models used by GenBank | | influenza | flu | yes | 70 | described in Database article | | Mpox | mpxv | yes | 1 | | | respiratory syncitial virus (RSV) | rsv | yes | 2 | |

Additional models are available. See this page for a list of all available models and additional information.


VADR documentation <a name="documentation"></a>


Contributors <a name="contributors"></a>

  • VADR includes contributions and input from current and former colleagues at NCBI, including:

    Rodney Brister

    Vince Calhoun

    Sergiy Gotvyanskyy

    Eneida Hatcher

    Sophia Hu

    Ilene Karsch-Mizrachi

    Rich McVeigh

    Susan Schafer

    Alejandro Schäffer

    Lara Shonkwiler

    Beverly Underwood

    Yuri Wolf

    Linda Yankie


Reference <a name="reference"></a>

  • The recommended citation for influenza analysis using VADR is: Vincent C Calhoun, Eneida L Hatcher, Linda Yankie, Eric P Nawrocki; Influenza sequence validation and annotation using VADR. Database. baae091. (2024). https://doi.org/10.1093/database/baae091

  • The recommended citation for using VADR for SARS-CoV-2 analysis: Eric P Nawrocki; Faster SARS-CoV-2 sequence validation and annotation for GenBank using VADR. NAR Genom Bioinform. 2023 Jan 20;5(1)::lqad002. (2023). https://doi.org/10.1093/nargab/lqad002

  • The recommended citation for all other uses of VADR is: Alejandro A Schäffer, Eneida L Hatcher, Linda Yankie, Lara Shonkwiler, J Rodney Brister, Ilene Karsch-Mizrachi, Eric P Nawrocki; VADR: validation and annotation of virus sequence submissions to GenBank. BMC Bioinformatics 21, 211 (2020). https://doi.org/10.1186/s12859-020-3537-3


Questions, comments or feature requests? Send a mail to eric.nawrocki@nih.gov.

View on GitHub
GitHub Stars114
CategoryDevelopment
Updated1mo ago
Forks27

Languages

Perl

Security Score

80/100

Audited on Jan 30, 2026

No findings