MTAGs
No description available
Install / Use
/learn @SushiLab/MTAGsREADME
mTAGs: taxonomic profiling using degenerate consensus reference sequences of ribosomal RNA genes
mTAGs is a tool for the taxonomic profiling of metagenomes. It detects sequencing reads belonging to the small subunit of the ribosomal RNA (SSU-rRNA) gene and annotates them through the alignment to full-length degenerate consensus SSU-rRNA reference sequences. The tool is capable of processing single-end and pair-end metagenomic reads, takes advantage of the information contained in any region of the SSU-rRNA gene and provides relative abundance profiles at multiple taxonomic ranks (Domain, Phylum, Class, Order, Family, Genus and OTUs defined at a 97% sequence identity cutoff).
The tool is developed by Hans-Joachim Ruscheweyh and Guillem Salazar and distributed under the .
If you use mTAGs, please cite:
Salazar G*, Ruscheweyh H-J*, Hildebrand F, Acinas S and Sunagawa S. mTAGs: taxonomic profiling using degenerate consensus reference sequences of ribosomal RNA gene. Bioinformatics, 2021.
Analyses in the publication were executed using version 1.0.0
Questions/Comments? Write a github issue.
Installation
mTAGs is written in python and has the following dependencies:
Installation using conda
The easiest way to install mTAGs is to use the conda package manager, which will automatically create an environment with dependencies installed in the correct version.
$ conda create -n mtags python=3.7 hmmer vsearch
$ source activate mtags
# or
$ conda activate mtags
$ pip install mTAGs
# Download mTAGs database
$ mtags download
2021-06-21 12:02:40,294 INFO: Starting mTAGs
2021-06-21 12:02:40,294 INFO: Start downloading the mTAGs database. ~600MB
2021-06-21 12:05:17,883 INFO: Finished downloading the mTAGs database.
2021-06-21 12:05:17,883 INFO: Finishing mTAGs
$ mtags
<details><summary>mTAGs output</summary>
<p>
Program: mTAGs - taxonomic profiling using degenerate consensus reference
sequences of ribosomal RNA gene
Version: 1.0.4
Reference: Salazar, Ruscheweyh, et al. mTAGs: taxonomic profiling using
degenerate consensus reference sequences of ribosomal RNA
gene. Bioinformatics (2021)
Usage: mtags <command> [options]
Command:
-- General
profile Extract and taxonomically annotate rRNA reads in metagenomic samples
merge Merge profiles
-- Expert
extract Extract rRNA reads in metagenomic samples
annotate Annotate and quantify rRNA reads
-- Installation
download Download the mTAGs database - Once after download of the tool
The database needs to be downloaded in the last step of the installation. This needs to be done once and before the first metagenomic samples can be processed:
</p>
</details>
Manual installation
Manual installation is possible but not recommended. Install via pip after installation of dependencies:
$ pip install mTAGs
# Download mTAGs database
$ mtags download
2021-06-21 12:02:40,294 INFO: Starting mTAGs
2021-06-21 12:02:40,294 INFO: Start downloading the mTAGs database. ~600MB
2021-06-21 12:05:17,883 INFO: Finished downloading the mTAGs database.
2021-06-21 12:05:17,883 INFO: Finishing mTAGs
$ mtags
<details><summary>mTAGs output</summary>
<p>
Program: mTAGs - taxonomic profiling using degenerate consensus reference
sequences of ribosomal RNA gene
Version: 1.0.4
Reference: Salazar, Ruscheweyh, et al. mTAGs: taxonomic profiling using
degenerate consensus reference sequences of ribosomal RNA
gene. Bioinformatics (2021)
Usage: mtags <command> [options]
Command:
-- General
profile Extract and taxonomically annotate rRNA reads in metagenomic samples
merge Merge profiles
-- Expert
extract Extract rRNA reads in metagenomic samples
annotate Annotate and quantify rRNA reads
-- Installation
download Download the mTAGs database - Once after download of the tool
The database needs to be downloaded in the last step of the installation. This needs to be done once and before the first metagenomic samples can be processed:
</p>
</details>
Usage
The tool is split in to two steps: profiling and merging. The first step (mtags profile [options]) uses HMM models to extract potential rRNA sequences from metagenomic data and annotates them taxonomically through the alignment of these sequences against a modified Silva database. The second step (mtags merge [options]) is a function that merges taxonomic profiles from different metagenomic samples. The steps for extraction and annotation of rRNA sequences are grouped into a single command but can also be run independently (mtags extract [options] and mtags annotate [options]).
PROFILE
This step uses precomputed HMM models to extract rRNA sequences from a metagenomic sample. The rRNA sequences are then aligned against a clustered rRNA database to annotate sequences and profile samples. mTAGs takes as input fasta/fastq files with quality controlled sequencing data.
$ mtags profile
Program: mTAGs - taxonomic profiling using degenerate consensus reference
sequences of ribosomal RNA gene
Version: 1.0.4
Reference: Salazar, Ruscheweyh, et al. mTAGs: taxonomic profiling using
degenerate consensus reference sequences of ribosomal RNA
gene. Bioinformatics (2021)
Usage: mtags profile [options]
Input options:
-f FILE [FILE ...] Forward reads file. Can be fasta/fastq and gzipped.
-r FILE [FILE ...] Reverse reads file. Can be fasta/fastq and gzipped.
-s FILE [FILE ...] Single/merge reads file. Can be fasta/fastq and gzipped.
Output options:
-o DIR Output folder [Required]
Other options:
-n STR Samplename [Required]
-t INT Number of threads. [4]
-ma INT Maxaccepts, vsearch parameter. Larger
numbers increase sensitivity and runtime. [1000]
-mr INT Maxrejects, vsearch parameter. Larger
numbers increase sensitivity and runtime. [1000]
# Example usage of the mTAGs profile routine
$ mtags profile -f sample.1.fq.gz -r sample.2.fq.gz -s sample.s.fq.gz sample.m.fq.gz -o output -t 4 -n sample -ma 1000 -mr 1000
<details><summary>mTAGs log</summary>
<p>
2021-06-21 09:04:48,644 INFO: Starting mTAGs
2021-06-21 09:04:48,646 INFO: Extracting FastA and revcomp FastA from input/sample.1.fq.gz
2021-06-21 09:04:59,536 INFO: Processed reads: 824523
2021-06-21 09:04:59,536 INFO: Finished extracting. Found 824523 sequences.
2021-06-21 09:04:59,536 INFO: Start detecting rRNA sequences in FastA files
2021-06-21 09:04:59,536 INFO: Start detecting rRNA sequences for molecule=ssu
2021-06-21 09:04:59,536 INFO: Executing: hmmsearch --cpu 4 -o sample/sample.1.fq.gz_fw.fasta_ssu.hmmer --domtblout sample/sample.1.fq.gz_fw.fasta_ssu.dom -E 0.01 mTAGs/data/ssu.hmm sample/sample.1.fq.gz_fw.fasta
2021-06-21 09:05:06,982 INFO: Finished hmmsearch
2021-06-21 09:05:06,988 INFO: Executing: hmmsearch --cpu 4 -o sample/sample.1.fq.gz_rev.fasta_ssu.hmmer --domtblout sample/sample.1.fq.gz_rev.fasta_ssu.dom -E 0.01 mTAGs/data/ssu.hmm sample/sample.1.fq.gz_rev.fasta
2021-06-21 09:05:14,442 INFO: Finished hmmsearch
2021-06-21 09:05:14,449 INFO: Finished detecting rRNA sequences for molecule=ssu
2021-06-21 09:05:14,449 INFO: Start detecting rRNA sequences for molecule=lsu
2021-06-21 09:05:14,450 INFO: Executing: hmmsearch --cpu 4 -o sample/sample.1.fq.gz_fw.fasta_lsu.hmmer --domtblout sample/sample.1.fq.gz_fw.fasta_lsu.dom -E 0.01 mTAGs/data/lsu.hmm sample/sample.1.fq.gz_fw.fasta
2021-06-21 09:05:35,255 INFO: Finished hmmsearch
2021-06-21 09:05:35,266 INFO: Executing: hmmsearch --cpu 4 -o sample/sample.1.fq.gz_rev.fasta_lsu.hmmer --domtblout sample/sample.1.fq.gz_rev.fasta_lsu.dom -E 0.01 mTAGs/data/lsu.hmm sample/sample.1.fq.gz_rev.fasta
2021-06-21 09:05:55,845 INFO: Finished hmmsearch
2021-06-21 09:05:55,859 INFO: Finished detecting rRNA sequences for molecule=lsu
2021-06-21 09:05:55,859 INFO: Found 4143 potential rRNA sequences.
2021-06-21 09:05:55,859 INFO: Finished detecting rRNA sequences from FastA files.
2021-06-21 09:05:55,859 INFO: Finding best molecule for each read
2021-06-21 09:05:55,867 INFO: Finished finding best molecule for each read
2021-06-21 09:05:55,867 INFO: Start extracting reads/writing output
2021-06-21 09:05:58,911 INFO: Processed reads: 824523
2021-06-21 09:05:58,912 INFO: Finished extracting reads/writing output
2021-06-21 09:05:58,912 INFO: euk_lsu 1571
2021-06-21 09:05:58,912 INFO: bac_lsu 1114
2021-06-21 09:05:58,912 INFO: euk_ssu 865
2021-06-21 09:05:58,912 INFO: bac_ssu 583
2021-06-21 09:05:58,912 INFO: arc_lsu 9
2021-06-21 09:05:58,912 INFO: arc_ssu 1
2021-06-21 09:05:58,927 INFO: Extracting FastA and revcomp FastA from input/sample.2.fq.gz
2021-06-21 09:06:10,001 INFO: Processed reads: 824523
2021-06-21 09:06:10,002 INFO: Finished extracting. Found 824523 sequences.
2021-06-21 09:06:10,002 INFO: Start detecting rRNA sequences in FastA files
2021-06-21 09:06:10,002 INFO: Start detecting rRNA sequences for molecule=ssu
2021-06-21 09:06:10,002 INFO: Executing: hmmse
