SkillAgentSearch skills...

Qiime2R

Import qiime2 artifacts to R

Install / Use

/learn @jbisanz/Qiime2R
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

Tutorial: Integrating QIIME2 and R for data visualization and analysis using qiime2R (v0.99.6)

Background

The qiime artifact is a method for storing the input and outputs for QIIME2 along with associated metadata and provenance information about how the object was formed. This method of storing objects has a number of obvious advantages; however, on the surface it does not lend itself to easy import to R for the R-minded data scientist. In reality, the .qza file is a compressed directory with an intuitive structure.

While it is possible to export data on a one-by-one basis from the qiime artifacts using qiime's built in suite of export features this is problematic and runs antithetical to the purpose of the artifact format for the following reasons:

  • Export of data from the artifact using QIIME2 requires an installation which may not be available on the user's computer and may not be trivial to install for a novice user
  • Export of the data will loose the associated provenance information. Now the origin of the data can't be traced and the parameters that led to its generation have been lost.
  • Export of the data on a one-by-one basis is tedious and creates multiple copies of intermediate files
  • R has many options for advanced data analysis and/or visualization that may not natively supported in QIIME or python environments

This package is trying to simplify the process of getting the artifact into R without discarding any of the associated data through a simple read_qza function. The artifact is unpacked in to a temporary directory and the raw data and associated metadata are read into a named list (see below). Data are typically returned as either a data.frame, phylo object (trees), or DNAStringSets (nucleic acid sequences).

Functions

  • read_qza() - Function for reading artifacts (.qza).
  • qza_to_phyloseq() - Imports multiple artifacts to produce a phyloseq object.
  • read_q2metadata() - Reads qiime2 metadata file (containing q2-types definition line)
  • write_q2manifest() - Writes a read manifest file to import data into qiime2
  • theme_q2r() - A ggplot2 theme for for clean figures.
  • print_provenance() - A function to display provenance information.
  • is_q2metadata() - A function to check if a file is a qiime2 metadata file.
  • parse_taxonomy() - A function to parse taxonomy strings and return a table where each column is a taxonomic class.
  • parse_ordination() - A function to parse the internal ordination format.
  • read_q2biom() - A function for reading QIIME2 biom files in format v2.1
  • make_clr() - Transform feature table using centered log2 ratio.
  • make_proportion() - Transform feature table to proportion (sum to 1).
  • make_percent() - Transform feature to percent (sum to 100).
  • interactive_table() - Create an interactive table in Rstudio viewer or rmarkdown html.
  • summarize_taxa()- Create a list of tables with abundances sumed to each taxonomic level.
  • taxa_barplot() - Create a stacked barplot using ggplot2.
  • taxa_heatmap() - Create a heatmap of taxonomic abundances using gplot2.
  • corner() - Show top corner of a large table-like obejct.
  • min_nonzero() - Find the smallest non-zero, non-NA in a numeric vector.
  • mean_sd() - Return mean and standard deviation for plotting.
  • subsample_table() - Subsample a table with or without replacement.
  • filter_features() - Remove low abundance features by number of counts and number of samples they appear in.

Installing qiime2R

qiime2R is currently available via github which can easily be installed in R via the following command:

if (!requireNamespace("devtools", quietly = TRUE)){install.packages("devtools")}
devtools::install_github("jbisanz/qiime2R")

Reading Artifacts (.qza)

This example is using data derived from the moving pictures tutorial. The main function we will need is read_qza():

?read_qza
read qiime2 artifacts (.qza)

Description

  extracts embedded data and object metadata into an R session

Usage

  read_qza(file, tmp, rm)

Arguments

  file - path to the input file, ex: file="~/data/moving_pictures/table.qza"
  tmp - a temporary directory that the object will be decompressed to (default="tempdir()")
  rm - should the decompressed object be removed at completion of function (T/F default=TRUE)

Value
  
  a named list of the following objects:
    artifact$data - the raw data ex OTU table as matrix or tree in phylo format
    artifact$uuid - the unique identifer of the artifact
    artifact$type - the semantic type of the object (ex FeatureData[Sequence])
    artifact$format - the format of the qiime artifact
    artifact$provenance - information tracking how the object was created
    artifact$contents - a table of all the files contained within the artifact and their file size
    artifact$version - the reported version for the artifact, a warning error may be thrown if a new version is seen

We will start be reading in a table of sequence variants (SVs):

SVs<-read_qza("table.qza")

When the artifact is imported, there are a number of pieces of information included. To see them we can use the names command:

names(SVs)
[1] "uuid"       "type"       "format"     "contents"   "version"   
[6] "data"       "provenance"

To access the actual data stored within the object, access the data as below:

SVs$data[1:5,1:5] #show first 5 samples and first 5 taxa
#                                 L1S105 L1S140 L1S208 L1S257 L1S281
#4b5eeb300368260019c1fbc7a3c718fc   2183      0      0      0      0
#fe30ff0f71a38a39cf1717ec2be3a2fc      5      0      0      0      0
#d29fe3c70564fc0f69f2c03e0d1e5561      0      0      0      0      0
#868528ca947bc57b69ffdf83e6b73bae      0   2249   2117   1191   1737
#154709e160e8cada6bfb21115acc80f5    802   1174    694    406    242

In the above file each row denotes a sequence variant where in the actual text is the hash of the complete sequence. See here for an example tool for generating hashes.

We can also look at the unique identifier for this object:

SVs$uuid
[1] "706b6bce-8f19-4ae9-b8f5-21b14a814a1b"

We can see the type of artifact:

SVs$type
[1] "FeatureTable[Frequency]"

We can also get a complete list of the files within the artifact and their sizes:

SVs$contents
#                                                                                                           files.Name files.Length          files.Date size
#1                                                                  706b6bce-8f19-4ae9-b8f5-21b14a814a1b/metadata.yaml           96 2020-02-28 10:21:00  224
#2                                                                  706b6bce-8f19-4ae9-b8f5-21b14a814a1b/checksums.md5         1341 2020-02-28 10:21:00  224
#3                                                                        706b6bce-8f19-4ae9-b8f5-21b14a814a1b/VERSION           39 2020-02-28 10:21:00  224
#4                                                       706b6bce-8f19-4ae9-b8f5-21b14a814a1b/provenance/metadata.yaml           96 2020-02-28 10:21:00  224
#5                                                       706b6bce-8f19-4ae9-b8f5-21b14a814a1b/provenance/citations.bib         4305 2020-02-28 10:21:00  224
#6                                                             706b6bce-8f19-4ae9-b8f5-21b14a814a1b/provenance/VERSION           39 2020-02-28 10:21:00  224
#7        706b6bce-8f19-4ae9-b8f5-21b14a814a1b/provenance/artifacts/4de0fc23-6462-43d3-8497-f55fc49f5db6/metadata.yaml          130 2020-02-28 10:20:00  224
#8        706b6bce-8f19-4ae9-b8f5-21b14a814a1b/provenance/artifacts/4de0fc23-6462-43d3-8497-f55fc49f5db6/citations.bib         3488 2020-02-28 10:20:00  224
#9              706b6bce-8f19-4ae9-b8f5-21b14a814a1b/provenance/artifacts/4de0fc23-6462-43d3-8497-f55fc49f5db6/VERSION           39 2020-02-28 10:20:00  224
#10  706b6bce-8f19-4ae9-b8f5-21b14a814a1b/provenance/artifacts/4de0fc23-6462-43d3-8497-f55fc49f5db6/action/action.yaml         5473 2020-02-28 10:20:00  224
#11 706b6bce-8f19-4ae9-b8f5-21b14a814a1b/provenance/artifacts/4de0fc23-6462-43d3-8497-f55fc49f5db6/action/barcodes.tsv          757 2020-02-28 10:20:00  224
#12       706b6bce-8f19-4ae9-b8f5-21b14a814a1b/provenance/artifacts/f5d67104-9506-4373-96e2-97df9199a719/metadata.yaml           98 2020-02-28 10:20:00  224
#13       706b6bce-8f19-4ae9-b8f5-21b14a814a1b/provenance/artifacts/f5d67104-9506-4373-96e2-97df9199a719/citations.bib         2774 2020-02-28 10:20:00  224
#14             706b6bce-8f19-4ae9-b8f5-21b14a814a1b/provenance/artifacts/f5d67104-9506-4373-96e2-97df9199a719/VERSION           39 2020-02-28 10:20:00  224
#15  706b6bce-8f19-4ae9-b8f5-21b14a814a1b/provenance/artifacts/f5d67104-9506-4373-96e2-97df9199a719/action/action.yaml         4893 2020-02-28 10:20:00  224
#16                                                 706b6bce-8f19-4ae9-b8f5-21b14a814a1b/provenance/action/action.yaml         5617 2020-02-28 10:21:00  224
#17                                                       706b6bce-8f19-4ae9-b8f5-21b14a814a1b/data/feature-table.biom       114900 2020-02-28 10:21:00  224

We can also print the providence; however, it is probably easier to use the q2view tool for a graphical aid in its interpretation.

print_provenance(SVs)
artifact$provenance = list 3 (118072 bytes)
.  706b6bce-8f19-4ae9-b8f5-21b14a814a1b/provenance/artifacts/4de0fc23-6462-43d3-8497-f55fc49f5db6/action/action.yaml = list 4
. .  execution = list 2
. . .  uuid = character 1= 8614aa95-3638-4e49-88f 
. . .  runtime = list 3
. . . .  start = character 1= 2020-02-28T10:19:02.86 
. . . .  end = character 1= 2020-02-28T10:19:24.46 
. . . .  duration = character 1= 21 seconds, and 605755 
...

Reading Metadata

If

Related Skills

View on GitHub
GitHub Stars176
CategoryDevelopment
Updated23d ago
Forks52

Languages

HTML

Security Score

95/100

Audited on Mar 6, 2026

No findings