SkillAgentSearch skills...

Fantaxtic

Fantaxtic - Nested Bar Plots for Phyloseq Data

Install / Use

/learn @gmteunisse/Fantaxtic
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

<!-- README.md is generated from README.Rmd. Please edit that file -->

fantaxtic

fantaxtic contains a set of functions to identify and visualize the most abundant taxa in phyloseq objects. It allows users to identify top taxa using any metric and any grouping, and plot the (relative) abundances of the top taxa using a nested bar plot visualisation. In the nested bar plot, colours or fills signify a top taxonomic rank (e.g. Phylum), and a gradient of shades and tints signifies levels at a nested taxonomic rank (e.g. Species). It is particularly useful to present an overview of microbiome sequencing, amplicon sequencing or metabarcoding data.

Note that fantaxtic is essentially a wrapper around ggnested, with some accessory functions to identify top taxa and to ensure that the plot is useful. Thus, the output is ggplot2 object, and can be manipulated as such.

Keywords: nested bar plot, phyloseq, taxonomy, most abundant taxa, multiple levels, shades, tints, gradient, 16S, ITS ,18S, microbiome, amplicon sequencing, metabarcoding

Installation

if(!"devtools" %in% installed.packages()){
  install.packages("devtools")
}
devtools::install_github("gmteunisse/fantaxtic")

Basic usage

The workflow consists of two parts:

  1. Identify top taxa using either top_taxa or nested_top_taxa
  2. Visualise the top taxa using nested_bar_plot

For basic usage, only a few lines of R code are required. To identify and plot the top 10 most abundant ASVs by their mean relative abundance, using Phylum as the top rank and Species as the nested rank, run:

require("fantaxtic")
require("phyloseq")
require("tidyverse")
require("magrittr")
require("ggnested")
require("knitr")
require("gridExtra")
data(GlobalPatterns)
top_asv <- top_taxa(GlobalPatterns, n_taxa = 10)
plot_nested_bar(ps_obj = top_asv$ps_obj,
                top_level = "Phylum",
                nested_level = "Species")

<!-- -->

To identify and plot the top 3 most abundant Phyla, and the top 3 most abundant species within those Phyla, run:

top_nested <- nested_top_taxa(GlobalPatterns,
                              top_tax_level = "Phylum",
                              nested_tax_level = "Species",
                              n_top_taxa = 3, 
                              n_nested_taxa = 3)
plot_nested_bar(ps_obj = top_nested$ps_obj,
                top_level = "Phylum",
                nested_level = "Species")

<!-- -->

top_taxa

This function identifies the top n taxa by some metric (e.g. mean, median, variance, etc.) in a phyloseq object. It outputs a table with the top taxa, as well as a phyloseq object in which all other taxa have been merged into a single taxon.

Taxonomic rank

By default, top_taxa runs the analysis at the ASV level; however, if a tax_level is specified (e.g. Species), it first agglomerates the taxa in the phyloseq object at that rank and then runs the analysis. Note that taxonomic agglomeration makes the assumption that taxa with the same name at all ranks are identical. This also includes taxa with missing annotations (NA). By default, top_taxa does not considered taxa with an NA annotation at tax_level, but this can be overcome by setting include_na_taxa = T.

top_species <- top_taxa(GlobalPatterns,
                        n_taxa = 10, 
                        tax_level = "Species")
top_species$top_taxa %>%
  mutate(abundance = round(abundance, 3)) %>%
  kable(format = "markdown")

| tax_rank | taxid | abundance | Kingdom | Phylum | Class | Order | Family | Genus | Species | |---------:|:-------|----------:|:---------|:---------------|:--------------------|:------------------|:-------------------|:-----------------|:----------------------------| | 4 | 326977 | 0.010 | Bacteria | Actinobacteria | Actinobacteria | Bifidobacteriales | Bifidobacteriaceae | Bifidobacterium | Bifidobacteriumadolescentis | | 9 | 9514 | 0.005 | Bacteria | Proteobacteria | Gammaproteobacteria | Pasteurellales | Pasteurellaceae | Actinobacillus | Actinobacillusporcinus | | 1 | 94166 | 0.014 | Bacteria | Proteobacteria | Gammaproteobacteria | Pasteurellales | Pasteurellaceae | Haemophilus | Haemophilusparainfluenzae | | 8 | 469778 | 0.005 | Bacteria | Bacteroidetes | Bacteroidia | Bacteroidales | Bacteroidaceae | Bacteroides | Bacteroidescoprophilus | | 6 | 471122 | 0.006 | Bacteria | Bacteroidetes | Bacteroidia | Bacteroidales | Prevotellaceae | Prevotella | Prevotellamelaninogenica | | 10 | 248140 | 0.005 | Bacteria | Bacteroidetes | Bacteroidia | Bacteroidales | Bacteroidaceae | Bacteroides | Bacteroidescaccae | | 7 | 470973 | 0.005 | Bacteria | Firmicutes | Clostridia | Clostridiales | Lachnospiraceae | Ruminococcus | Ruminococcustorques | | 3 | 171551 | 0.011 | Bacteria | Firmicutes | Clostridia | Clostridiales | Ruminococcaceae | Faecalibacterium | Faecalibacteriumprausnitzii | | 2 | 98605 | 0.013 | Bacteria | Firmicutes | Bacilli | Lactobacillales | Streptococcaceae | Streptococcus | Streptococcussanguinis | | 5 | 114821 | 0.009 | Bacteria | Firmicutes | Clostridia | Clostridiales | Veillonellaceae | Veillonella | Veillonellaparvula |

Grouping

Furthermore, if one or more grouping factors are specified in grouping, it will calculate the top n taxa using the samples in each group, rather than using all samples in the phyloseq object. This makes it possible to for example identify the top taxa in each sample, or the top taxa in each treatment group.

top_grouped <- top_taxa(GlobalPatterns,
                        n_taxa = 1,
                        grouping = "SampleType")
top_grouped$top_taxa %>%
  mutate(abundance = round(abundance, 3)) %>%
  kable(format = "markdown")

| SampleType | tax_rank | taxid | abundance | Kingdom | Phylum | Class | Order | Family | Genus | Species | |:-------------------|---------:|:-------|----------:|:---------|:---------------|:----------------------|:------------------|:-------------------|:---------------------|:-----------------------| | Freshwater (creek) | 1 | 549656 | 0.464 | Bacteria | Cyanobacteria | Chloroplast | Stramenopiles | NA | NA | NA | | Freshwater | 1 | 279599 | 0.216 | Bacteria | Cyanobacteria | Nostocophycideae | Nostocales | Nostocaceae | Dolichospermum | NA | | Ocean | 1 | 557211 | 0.071 | Bacteria | Cyanobacteria | Synechococcophycideae | Synechococcales | Synechococcaceae | Prochlorococcus | NA | | Tongue | 1 | 360229 | 0.145 | Bacteria | Proteobacteria | Betaproteobacteria | Neisseriales | Neisseriaceae | Neisseria | NA | | Mock | 1 | 550960 | 0.117 | Bacteria | Proteobacteria | Gammaproteobacteria | Enterobacteriales | Enterobacteriaceae | Providencia | NA | | Sediment (estuary) | 1 | 319044 | 0.080 | Bacteria | Proteobacteria | Deltaproteobacteria | Desulfobacterales | Desulfobulbaceae | NA | NA | | Feces | 1 | 331820 | 0.137 | Bacteria | Bacteroidetes | Bacteroidia | Bacteroidales | Bacteroidaceae | Bacteroides | NA | | Soil | 1 | 36155 | 0.013 | Bacteria | Acidobacteria | Solibacteres | Solibacterales | Solibacteraceae | CandidatusSolibacter | NA | | Skin | 1 | 98605 | 0.103 | Bacteria | Firmicutes | Bacilli | Lactobacillales | Streptococcaceae | Streptococcus | Streptococcussanguinis |

Ranking metric

Lastly, any metric can be used to rank taxa by specifying a function through FUN. The mean is used by default, but depending on your analysis, you might want to use the median, variance, maximum or any other function that takes as input a numeric vector and outputs a single number.

top_max <- top_taxa(GlobalPatterns,
                        n_taxa = 10,
                        FUN = max)
top_max$top_taxa %>%
  mutate(abundance = round(abundance, 3)) %>%
  kable(format = "markdown")

| tax_rank | taxid | abundance | Kingdom | Phylum | Class | Order | Family | Genus | Species | |---------:|:-------|----------:|:---------|:---------------|:--------------------|:----------------|:-----------------|:---------------|:--------------------------| | 4 | 329744 | 0.266 | Bacteria | Actinobacteria | Actinobacteria | Actinomycetales | ACK-M1 | NA | NA | | 1 | 549656 | 0.500 | Bacteria | Cyanobacteria | Chloroplast | Stramenopiles | NA | NA | NA | | 2 | 279599 | 0.432 | Bacteria | Cyanobacteria | Nostocophycideae | Nostocales | Nostocaceae | Dolichospermum | NA | | 3 | 360229 | 0.270 | Bacteria | Proteobacteria | Betaproteobacteria | Neisseriales | Neisseriaceae | Neisseria | NA | | 8 | 94166

Related Skills

View on GitHub
GitHub Stars33
CategoryDevelopment
Updated3mo ago
Forks3

Languages

R

Security Score

77/100

Audited on Dec 7, 2025

No findings