SkillAgentSearch skills...

Label

Lineage and clade classifier for influenza sequences

Install / Use

/learn @CDCgov/Label
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

LABEL, Lineage Assignment by Extended Learning

LABEL’s purpose is to quickly (relative to building an MSA and tree), automatically, and correctly assign clades or lineages to nucleotide sequences. Automated lineage assignment has applications in surveillance, research, and high-throughput database annotation. Additional information is on the LABEL website or you can read the [manuscript].

USAGE

LABEL v0.7.0, updated 2025
Samuel S. Shepard (vfn4@cdc.gov), Centers for Disease Control & Prevention
Usage:
        LABEL [-E C_OPT] [-W WRK_PATH|-O OUT_PATH] [-TRD|-S] [-L LIN_PATH] <nts.fasta> <project> <Module:H5,H9,etc.>
                -T      Do TRAINING again instead of using classifier files.
                -E      SGE clustering option. Use 1 or 2 for SGE with array jobs, else local.
                -R      No RECURSIVE prediction. Limits scope, useful with -L option.
                -D      No DELETION of extra intermediary files.
                -S      Show available protein modules.
                -W      Web-server mode: requires ABSOLUTE path to WRITABLE working directory.
                -O      Output directory path, do not use with web mode.
Example: ./LABEL -C gisaid_H5N1.fa Bird_Flu H5

DATA

  • LABEL takes FASTA formatted nucleotide sequences. The FASTA may be single or multi-line and may contain any number of sequences. Extra sequences with redundant headers are removed (first-read, first kept)! Commas and apostrophes are removed from headers while internal spaces are underlined.

  • LABEL generates re-annotated FASTA sequences, scoring data, tab-delimited files, and miscellaneous text files. LABEL's output is limited to text. LABEL's output is limited to a specified output directory (or to a default working directory within the package) and to the current working directory of the calling user.

FILES GENERATED

| File | Type | Description | | :------------------------- | :-------- | :-------------------------------------------------------------------------- | | PROJ_final.tab | Standard. | Tab-delimited headers & predicted clades. | | PROJ_final.txt | Standard. | A prettier output of the above. | | LEVEL_trace.tab | Standard. | Table of HMM scores at each level, suitable for visualization in R. | | LEVEL_result.tab | Standard. | For the current prediction level, tab-delimited headers & predicted clades. | | LEVEL_result.txt | Standard. | For the current prediction level, A prettier output of the above. | | FASTA/ | Standard. | Folder containing fasta files and newick trees. | | FASTA/PROJ_predictions.fas | Standard. | Query sequence file with predictions added like: _{PRED:CLAD} | | FASTA/PROJ_reannotated.fas | Default. | Query file with annotations replaced with predicted ones, ordered by clade. | | FASTA/PROJ_clade_CLAD.fas | Standard. | The re-annotated file partitioned into separate clade files. | | c-*/ | Standard. | Clade/lineage subfolder for the hierarchical predictions. |

The project name is denoted "PROJ", the lineage or clade is called "CLAD", and the module of interest as “MOD”.

MODULES

LABEL modules are merely directories within the LABEL_RES/training_data folder and contain all associated pHMMs as well as SVM training data. Extensions such as x-filter.txt control against inappropriate data input.

Available Modules

Most of the these modules were trained by Sam Shepard and/or Ujwal Bagal.

| Module | Description | | ------------------------------------------------------------------ | --------------------------------------------------------------------------------------------------------------------------------------------------------------- | | B_HAv2019, B_HAv2017 | Influenza B hemagglutinin clade modules, trained in 2019 and 2017 | | B_NAv2016 | Influenza B neuraminidase clade module, trained in 2016 | | B_PB2v2016, B_PB1v2016, B_PAv2016, B_NPv2016, B_MPv2016, B_NSv2016 | Influenza B internal gene segment lineage modules, trained in 2016 | | †H5v2023 | A provisional module for a proposed update to the H5 nomenclature, trained in 2023 | | H5v2015, H5v2013, H5v2011 | Influenza A hemagglutinin modules for H5N1, for nomenclatures from 2015, c. 2013, & c. 2011 | | †H7v2013 | Influenza A hemagglutinin module for H7 subtype, trained in 2013 | | H9v2011 | Influenza A hemagglutinin module for H9N2 described in the LABEL [manuscript] | | H1pdm09v2019, H1pdm09v2018 | Influenza A H1N1pdm09 classification modules, trained in 2019 and 2018 | | H3v2019, H3v2016b, H3v2016a, H3v2016 | Influenza A hemagglutinin modules for H3 subtype classification, trained in 2016 and 2019 | | irma-FLU, irma-FLU-v2 | [IRMA] modules for influenza virus classification | | irma-FLU-HA, irma-FLU-NA, irma-FLU-OG, irma-FLU-OG-v2 | [IRMA] modules for influenza hemagglutinin, neuraminadase, and other influenza genes. Note: HA, NA and OG are part of IRMA's secondary two-stage LABEL modules. | | irma-FLU-HE | [IRMA] module for hemagglutinin-esterase (flu C,D). |

Modules may contain a release.txt file with addition information. For up-to-date module availability, use: ./LABEL -S

Provisional or experimental

INSTALLATION & REQUIREMENTS

We recommend a single multi-core machine with no fewer than 2 cores (8 or more threads work best) and at least 2 GB of RAM. LABEL runtime is impacted by the number of cores available on a machine. In addition software requirements include:

  • Linux (RHEL8 or later GLIBC), MacOS 10.14 (intel) or MacOS 11 (arm64)
    • BASH version 3+
    • Standard utilities: sleep, cut, paste, jobs, zip, env, cat, cp, getopts.
  • Perl version 5.16 or later
    • Standard includes: Getopt::Long, File::Basename

Via Archive

Download the latest archive via our releases page. Use of wget or curl for downloads is recommended for MacOS to preserve functionality.

  1. Unzip the archive containing LABEL.

  2. Move the package to your desired location and add the folder to your PATH

    • Note: LABEL_RES and LABEL must be in the same folder.
  3. LABEL is now installed. To test it from the package folder, execute:

    ./LABEL LABEL_RES/training_data/H9v2011/H9v2011_downsample.fa test_project H9v2011
    

Via Docker

Simply run:

docker run --rm -itv $(pwd):/data ghcr.io/cdcgov/label:latest LABEL # label args

Third Party Software

We aggregate and provide builds of 3rd party software for execution at runtime with LABEL. You may install or obtain your own copies and LABEL will detect them, but the user will be required to test for compatibility.

  • [GNU Parallel]
    • Artifacts: parallel
    • Requires: system Perl
    • Purpose: parallelization
    • License: [GPL v3]
  • [SHOGUN] version 1.1.0 (2.1+ is not compatible)
    • Artifacts: shogun (cmdline_static)
    • Provided architectures: linux/x86_64, linux/aarch64, apple/universal (*[arm64][arm64-mac-build] + intel)
    • Purpose: executes the SVM decision phase.
    • License: [GPL v3]
  • [SAM] version 3.5
    • Artifacts: align2model, hmmscore, modelfromalign
    • Provided architectures: linux/x86_64, linux/aarch64, apple/universal (arm64 + intel)
    • Purpose: build HMM profiles, score sequences for evaluation
    • License: [Custom][sam-license] academic/government, not-for-profit, redistributed [with permission]

[!WARNING] Note that [SAM] is redistributed with permission for LABEL but its terms exclude commerical use without a license. If you are a commercial entity, you might need to reach out to the authors to obtain their [custom][sam-license] license.

  • Minor modifications to allow compilation of the legacy software.

Related Skills

View on GitHub
GitHub Stars4
CategoryDevelopment
Updated1mo ago
Forks0

Languages

Perl

Security Score

90/100

Audited on Mar 4, 2026

No findings