Taxtriage
TaxTriage is a Nextflow workflow designed to agnostically identify and classify microbial organisms within short- or long-read metagenomic NGS data. This flexible tool was developed with various use-cases of mNGS in mind.
Install / Use
/learn @jhuapl-bio/TaxtriageREADME
About
TaxTriage is a flexible, containerized bioinformatics pipeline designed to identify pathogens within complex samples/specimens (e.g., respiratory swabs, lesion swabs, whole blood) using untargeted DNA or RNA sequencing data. It is designed for short- (Illumina) or long-read (ONT, PacBio) platforms, and incorporates numerous software packages to perform quality control, organism classification, and read mapping. Additionally, TaxTriage incorporates intermediate data into a unified confidence metric for all organisms identified. The final analysis output is incorporated into an Organism Discovery Report, represented as a single PDF, with summaries of the intermediate data supporting pathogen identification. TaxTriage is designed for broad deployment and early-stage outbreak investigations and is not intended for use as a standalone diagnostic capability.

Description
The TaxTriage pipeline aims to democratize metagenomic sequence analysis for early warning and outbreak investigations, both in public health and potentially clinical settings. To enable this capability, TaxTriage was developed to ingest short- or long-read metagenomic sequencing data generated from tissues (human or animal). The intent is to provide non-bioinformaticians a tool capable of generating species-level identifications of pathogens from raw metagenomic or targeted sequencing data. Specific modules are developed to consider sequencing chemistry and sample types (e.g. blood vs. saliva, etc.). Strain, variant, or clade-level distinction may be possible with specialized datasets, but it is anticipated that the level of granularity would require subsequent, specialized analyses.
- Quality control steps
- In-silico host depletion
- Classification of reads
- Mapping of reads to reference genomes found to be "top hits"
- Confidence metric generation (e.g., depth/breadth of coverage, %nt ID)
- Threshold mechanisms
- De-novo assembly
- Detailed MultiQC reports
- Concise final report (intended to have all data fields required for use in clinical settings)
For the purpose of giving an initial triage of taxonomic classifications, using Kraken2 database(s), that can then be ingested into a CLIA-style report format. This component is under active development, but in the current state it is capable of running a set number of samples end-to-end using a user-created samplesheet in .csv format. The output formats include PDF and HTML which are highly interactive and distributable.
See Important output locations for information on where to get the most important output files from the pipeline.
See here for information on how "top hits" is located
See collaborative efforts with groups outside JHU/APL here
Alerts
:warning: If you make changes to the code within a nextflow-pulled repo, a change can result in a conflict in updating already cloned repos when running the test profile or called -latest -r main/stable. As a result you must run nextflow drop https://github.com/jhuapl-bio/taxtriage first. This only applies to pipelines run by calling the remote repo and the previously mentioned parameters. If you expect to make local changes frequently, you should just git clone and git pull manually and run the pipeline from the main.nf file. See here for more info
The pipeline is built using Nextflow, a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It uses Docker/Singularity containers making installation trivial and results highly reproducible. The Nextflow DSL2 implementation of this pipeline uses one container per process which makes it much easier to maintain and update software dependencies. Where possible, these processes have been submitted to and installed from nf-core/modules in order to make them available to all nf-core pipelines, and to everyone within the Nextflow community!
<!-- TODO nf-core: Add full-sized test dataset and amend the paragraph below if applicable -->On release, automated continuous integration tests run the pipeline on a full-sized dataset on the AWS cloud infrastructure. This ensures that the pipeline runs on AWS, has sensible resource allocation defaults set to run on real-world datasets, and permits the persistent storage of results to benchmark between pipeline releases and other analysis sources. The results obtained from the full-sized test can be viewed on the nf-core website.
Tax Triage is designed as a pipeline for the purpose of giving an initial triage of taxonomic classifications, using Kraken2 database(s), that can then be ingested into a CLIA-style report format. It is under active development, but in the current state it is capable of running a set number of samples end-to-end using a user-created samplesheet in .csv format. The output format is a HTML which is highly interactive and distributable.
Efforts are underway to provide full support of this pipeline on nf-core to provide a seamless deployment methodology. The pipeline also requires installation of Docker or Singularity (CE ONLY v4+) for the individual modules within it. Because these modules are separate from the source code of TaxTriage, we recommend following the examples outlined in the usage details first to automatically run the pipeline and install all dependencies while also giving you some example outputs and a better feel for how the pipeline operates.
See Here for full usage details
See Here for troubleshooting & FAQ
Installation
TaxTriage requires 2 primary installs for it to work
- Nextflow
- Singularity or Docker (recommended)
1. Nextflow
Follow instructions here or run these commands in your WSL2, Native Linux, or Mac environment
# Make sure that Java v11+ is installed:
java -version
# Install Nextflow
curl -fsSL get.nextflow.io | bash
Note, this command requires sudo to move to your home path. If you are on an HPC, make sure that nextflow is in your $PATH if not globally available
Place it in your $PATH
# Add Nextflow binary to your user's PATH:
mv nextflow ~/bin/
If installing globally, requiring sudo, type:
sudo mv nextflow /usr/local/bin
When complete, verify installation with nextflow -v to see the version
2. Containerization Approach Install
Choose A (Recommended - Docker) or B. If on a HPC, talk with your IT to get B. Singularity setup. You do NOT need to install both software tools.
A. Docker
Follow these steps for your OS here - IF on WSL2 (Windows), choose Docker Desktop for Windows and it should be available automatically in your WSL environment
B. Singularity
Quick Start
Make sure you have either Docker or Singularity installed, as well as Nextflow
Test Run
This will pull the test data and run the pipeline. It should take ~10-15 minutes.
nextflow run https://github.com/jhuapl-bio/taxtriage -r main -latest -profile test,docker -resume
❗If you want singularity instead, make sure to specify that in the profile instead of docker like: test,singularity
Cloud
Follow the steps here
Offline Local Mode
In some cases, you may not want to always pull the latest update(s) each time your run the pipeline. To solve this issue, you have 2 primary options:
A. Reference remote url, don't specify latest
nextflow run https://github.com/jhuapl-bio/taxtriage -r main -profile test,docker -resume
Here, we remove the -latest which will not attempt to pull updates. This will only work if you've already run the pipeline (thus pulling the code locally) in online mode like in the initial example for a test run
B. Clone the repo first, reference local main file
Here, we instead clone the repo. Then, we reference the launchfile called main.nf that is locally on our system. We need to ensure that we're always in the repo's directory each time we do this too
First we clone
git clone https://github.com/jhuapl-bio/taxtri
