AMRplusplus

AMR++ is a bioinformatic pipeline meant to aid in the analysis of raw sequencing reads to characterize the profile of antimicrobial resistance genes, or resistome.

Generate Convert Improve

Install / Use

/learn @Microbial-Ecology-Group/AMRplusplus

About this skill

Quality Score

0/100

README

Overview

AMR++ bioinformatic pipeline

(https://megares.meglab.org/)

AMR++ is a bioinformatic pipeline meant to aid in the analysis of raw sequencing reads to characterize the profile of antimicrobial resistance genes, or resistome. AMR++ was developed to work in conjuction with the the MEGARes database which contains sequence data for approximately 9,000 hand-curated antimicrobial resistance genes accompanied by an annotation structure that is optimized for use with high throughput sequencing and metagenomic analysis. The acyclical annotation graph of MEGARes allows for accurate, count-based, hierarchical statistical analysis of resistance at the population level, much like microbiome analysis, and is also designed to be used as a training database for the creation of statistical classifiers.

The goal of many metagenomics studies is to characterize the content and relative abundance of sequences of interest from the DNA of a given sample or set of samples. You may want to know what is contained within your sample or how abundant a given sequence is relative to another.

Often, metagenomics is performed when the answer to these questions must be obtained for a large number of targets where techniques like multiplex PCR and other targeted methods would be too cumbersome to perform. AMR++ can process the raw data from the sequencer, identify the fragments of DNA, and count them. It also provides a count of the polymorphisms that occur in each DNA fragment with respect to the reference database.

Additionally, you may want to know if the depth of your sequencing (how many reads you obtain that are on target) is high enough to identify rare organisms (organisms with low abundance relative to others) in your population. This is referred to as rarefaction and is calculated by randomly subsampling your sequence data at intervals between 0% and 100% in order to determine how many targets are found at each depth.

With AMR++, you will obtain alignment count files for each sample that are combined into a count matrix that can be analyzed using any statistical and mathematical techniques that can operate on a matrix of observations.

Important changes to AMR++

Detailed changes here.

Brief overview:

Switch to only counting primary resistome alignments.
Changed default AMR gene fraction --threshold to 0. We recommend running statistical analysis of count matrices after aggregating to the "Group" level to account for possible false-positive calls of individual gene accessions.
Added single-end and merged-read analysis.
Changed defaults to skip rarefaction analysis, but default to running the SNP confirmation and deduplication of resistome counts.

Additional analysis tips here.

More Information

AMR++ demonstration

If anaconda is already installed, we'll just need to download the AMR++ github repository and create the AMR++ conda environment. Please review the installation document for alternative methods to install AMR++ in your computing environment.

# Confirm conda works
conda -h

Clone the AMR++ repository.

git clone https://github.com/Microbial-Ecology-Group/AMRplusplus.git

Navigate into the AMR++ repository and run the test command.

cd AMRplusplus

# Now we can use the included recipe to make the AMR++ environment
conda env create -f envs/AMR++_env.yaml
# This can take 5-10 mins (or more) depending on your internet speed, computing resources, etc. 

# Once it's completed, activate the environment
conda activate AMR++_env.yaml

# You now have access to all the AMR++ software dependencies (locally)
samtools --help

# Run command to perform the demonstration pipeline using the conda profile.
nextflow run main_AMR++.nf

Now, you can check out the results in the newly created "test_results" directory.

Using AMR++ to analyze your data

AMR++ is customizable to suit your computing needs and analyze your data. Primarily, the -profile paramater allows you to choose between running AMR++ using a singularity container, docker container, anaconda packages, or a local installation of your software. All parameters used to control how AMR++ analyzes your data can also be changed as needed in a variety of ways. For full information, review this configuration document.

Below is a brief example, the default parameters were run using this command (with the conda environment, AMR++_env, already activated):

nextflow run main_AMR++.nf

To change the reads that were analyzed, you should specify the ```--reads`` parameters. Here, we can use regular expressions to point to your samples in a different directory.

nextflow run main_AMR++.nf --reads "path/to/your/reads/*_R{1,2}.fastq.gz"

Choosing the right pipeline

AMR++ analyzes data by combining workflows that takes a set of sequencing reads through various bioinformatic software. We recommend our standard AMR++ pipeline as a comprehensive way to start from raw sequencing reads, QC assessment, host DNA removal, and resistome analysis with MEGARes. However, users might only want to replicate portions of the pipeline and have more control over their computing needs. Using the --pipeline parameter, users can now change how AMR++ runs.

Check out this document for more details and guidance on picking the right --pipeline parameter.

Running analyses in steps

Realistically, running the entire pipeline can be challenging due to storage limitations. Instead, we recommend running the pipeline in steps, which allows for erasing the "work" directory in between analytic steps. Remember, the work directory is only needed in case the pipeline run fails and you want to use -resume to pick up where you left off.

Here are some tutorials to run each analysis step by step:

Optional flags

SNP verification

AMR++ now works in conjuction with a custom SNP verification software to evaluate alignments to gene accessions requiring SNP confirmation to confer resistance. To include this workflow, include the --snp Y flag in your command like this:

nextflow run main_AMR++.nf -profile conda --snp Y

This will create with the standard count table (AMR_analytic_matrix.csv) in addition to a count matrix with SNP confirmed counts (SNPconfirmed_AMR_analytic_matrix.csv).

Deduplicated counts

Another option is to include results for deduplicated counts by using the --deduped Y flag in your command.

nextflow run main_AMR++.nf -profile conda --snp Y --deduped Y

With this flag, AMR++ will extract the deduplicated alignments to MEGARes also output a count matrix with deduplicated counts. Since also we included the --snp Y flag, we will end up with 4 total output count matrices.

Rarefaction analyzer

The final optional analyis is to perform rarefaction on resistome counts to evaluate sequencing depth at all resistome annotation levels (i.e. Type, Class, Mechanism, Group, Gene). You can run this analysis by adding --rarefaction Y to your command or modifying the params.config file.

nextflow run main_AMR++.nf -profile conda --snp Y --deduped Y --rarefaction Y

With rarefaction analysis, we'll create various figures to summarize sequencing depth and output figures in the "ResistomeAnalysis/Rarefaction/Figures" directory.

Related Skills

node-connect

341.8k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

84.6k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

341.8k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

commit-push-pr

84.6k

Commit, push, and open a PR