Qiime2R
Import qiime2 artifacts to R
Install / Use
/learn @jbisanz/Qiime2RREADME
Tutorial: Integrating QIIME2 and R for data visualization and analysis using qiime2R (v0.99.6)
Background
The qiime artifact is a method for storing the input and outputs for QIIME2 along with associated metadata and provenance information about how the object was formed. This method of storing objects has a number of obvious advantages; however, on the surface it does not lend itself to easy import to R for the R-minded data scientist. In reality, the .qza file is a compressed directory with an intuitive structure.
While it is possible to export data on a one-by-one basis from the qiime artifacts using qiime's built in suite of export features this is problematic and runs antithetical to the purpose of the artifact format for the following reasons:
- Export of data from the artifact using QIIME2 requires an installation which may not be available on the user's computer and may not be trivial to install for a novice user
- Export of the data will loose the associated provenance information. Now the origin of the data can't be traced and the parameters that led to its generation have been lost.
- Export of the data on a one-by-one basis is tedious and creates multiple copies of intermediate files
- R has many options for advanced data analysis and/or visualization that may not natively supported in QIIME or python environments
This package is trying to simplify the process of getting the artifact into R without discarding any of the associated data through a simple read_qza function. The artifact is unpacked in to a temporary directory and the raw data and associated metadata are read into a named list (see below). Data are typically returned as either a data.frame, phylo object (trees), or DNAStringSets (nucleic acid sequences).
Functions
read_qza()- Function for reading artifacts (.qza).qza_to_phyloseq()- Imports multiple artifacts to produce a phyloseq object.read_q2metadata()- Reads qiime2 metadata file (containing q2-types definition line)write_q2manifest()- Writes a read manifest file to import data into qiime2theme_q2r()- A ggplot2 theme for for clean figures.print_provenance()- A function to display provenance information.is_q2metadata()- A function to check if a file is a qiime2 metadata file.parse_taxonomy()- A function to parse taxonomy strings and return a table where each column is a taxonomic class.parse_ordination()- A function to parse the internal ordination format.read_q2biom()- A function for reading QIIME2 biom files in format v2.1make_clr()- Transform feature table using centered log2 ratio.make_proportion()- Transform feature table to proportion (sum to 1).make_percent()- Transform feature to percent (sum to 100).interactive_table()- Create an interactive table in Rstudio viewer or rmarkdown html.summarize_taxa()- Create a list of tables with abundances sumed to each taxonomic level.taxa_barplot()- Create a stacked barplot using ggplot2.taxa_heatmap()- Create a heatmap of taxonomic abundances using gplot2.corner()- Show top corner of a large table-like obejct.min_nonzero()- Find the smallest non-zero, non-NA in a numeric vector.mean_sd()- Return mean and standard deviation for plotting.subsample_table()- Subsample a table with or without replacement.filter_features()- Remove low abundance features by number of counts and number of samples they appear in.
Installing qiime2R
qiime2R is currently available via github which can easily be installed in R via the following command:
if (!requireNamespace("devtools", quietly = TRUE)){install.packages("devtools")}
devtools::install_github("jbisanz/qiime2R")
Reading Artifacts (.qza)
This example is using data derived from the moving pictures tutorial. The main function we will need is read_qza():
?read_qza
read qiime2 artifacts (.qza)
Description
extracts embedded data and object metadata into an R session
Usage
read_qza(file, tmp, rm)
Arguments
file - path to the input file, ex: file="~/data/moving_pictures/table.qza"
tmp - a temporary directory that the object will be decompressed to (default="tempdir()")
rm - should the decompressed object be removed at completion of function (T/F default=TRUE)
Value
a named list of the following objects:
artifact$data - the raw data ex OTU table as matrix or tree in phylo format
artifact$uuid - the unique identifer of the artifact
artifact$type - the semantic type of the object (ex FeatureData[Sequence])
artifact$format - the format of the qiime artifact
artifact$provenance - information tracking how the object was created
artifact$contents - a table of all the files contained within the artifact and their file size
artifact$version - the reported version for the artifact, a warning error may be thrown if a new version is seen
We will start be reading in a table of sequence variants (SVs):
SVs<-read_qza("table.qza")
When the artifact is imported, there are a number of pieces of information included. To see them we can use the names command:
names(SVs)
[1] "uuid" "type" "format" "contents" "version"
[6] "data" "provenance"
To access the actual data stored within the object, access the data as below:
SVs$data[1:5,1:5] #show first 5 samples and first 5 taxa
# L1S105 L1S140 L1S208 L1S257 L1S281
#4b5eeb300368260019c1fbc7a3c718fc 2183 0 0 0 0
#fe30ff0f71a38a39cf1717ec2be3a2fc 5 0 0 0 0
#d29fe3c70564fc0f69f2c03e0d1e5561 0 0 0 0 0
#868528ca947bc57b69ffdf83e6b73bae 0 2249 2117 1191 1737
#154709e160e8cada6bfb21115acc80f5 802 1174 694 406 242
In the above file each row denotes a sequence variant where in the actual text is the hash of the complete sequence. See here for an example tool for generating hashes.
We can also look at the unique identifier for this object:
SVs$uuid
[1] "706b6bce-8f19-4ae9-b8f5-21b14a814a1b"
We can see the type of artifact:
SVs$type
[1] "FeatureTable[Frequency]"
We can also get a complete list of the files within the artifact and their sizes:
SVs$contents
# files.Name files.Length files.Date size
#1 706b6bce-8f19-4ae9-b8f5-21b14a814a1b/metadata.yaml 96 2020-02-28 10:21:00 224
#2 706b6bce-8f19-4ae9-b8f5-21b14a814a1b/checksums.md5 1341 2020-02-28 10:21:00 224
#3 706b6bce-8f19-4ae9-b8f5-21b14a814a1b/VERSION 39 2020-02-28 10:21:00 224
#4 706b6bce-8f19-4ae9-b8f5-21b14a814a1b/provenance/metadata.yaml 96 2020-02-28 10:21:00 224
#5 706b6bce-8f19-4ae9-b8f5-21b14a814a1b/provenance/citations.bib 4305 2020-02-28 10:21:00 224
#6 706b6bce-8f19-4ae9-b8f5-21b14a814a1b/provenance/VERSION 39 2020-02-28 10:21:00 224
#7 706b6bce-8f19-4ae9-b8f5-21b14a814a1b/provenance/artifacts/4de0fc23-6462-43d3-8497-f55fc49f5db6/metadata.yaml 130 2020-02-28 10:20:00 224
#8 706b6bce-8f19-4ae9-b8f5-21b14a814a1b/provenance/artifacts/4de0fc23-6462-43d3-8497-f55fc49f5db6/citations.bib 3488 2020-02-28 10:20:00 224
#9 706b6bce-8f19-4ae9-b8f5-21b14a814a1b/provenance/artifacts/4de0fc23-6462-43d3-8497-f55fc49f5db6/VERSION 39 2020-02-28 10:20:00 224
#10 706b6bce-8f19-4ae9-b8f5-21b14a814a1b/provenance/artifacts/4de0fc23-6462-43d3-8497-f55fc49f5db6/action/action.yaml 5473 2020-02-28 10:20:00 224
#11 706b6bce-8f19-4ae9-b8f5-21b14a814a1b/provenance/artifacts/4de0fc23-6462-43d3-8497-f55fc49f5db6/action/barcodes.tsv 757 2020-02-28 10:20:00 224
#12 706b6bce-8f19-4ae9-b8f5-21b14a814a1b/provenance/artifacts/f5d67104-9506-4373-96e2-97df9199a719/metadata.yaml 98 2020-02-28 10:20:00 224
#13 706b6bce-8f19-4ae9-b8f5-21b14a814a1b/provenance/artifacts/f5d67104-9506-4373-96e2-97df9199a719/citations.bib 2774 2020-02-28 10:20:00 224
#14 706b6bce-8f19-4ae9-b8f5-21b14a814a1b/provenance/artifacts/f5d67104-9506-4373-96e2-97df9199a719/VERSION 39 2020-02-28 10:20:00 224
#15 706b6bce-8f19-4ae9-b8f5-21b14a814a1b/provenance/artifacts/f5d67104-9506-4373-96e2-97df9199a719/action/action.yaml 4893 2020-02-28 10:20:00 224
#16 706b6bce-8f19-4ae9-b8f5-21b14a814a1b/provenance/action/action.yaml 5617 2020-02-28 10:21:00 224
#17 706b6bce-8f19-4ae9-b8f5-21b14a814a1b/data/feature-table.biom 114900 2020-02-28 10:21:00 224
We can also print the providence; however, it is probably easier to use the q2view tool for a graphical aid in its interpretation.
print_provenance(SVs)
artifact$provenance = list 3 (118072 bytes)
. 706b6bce-8f19-4ae9-b8f5-21b14a814a1b/provenance/artifacts/4de0fc23-6462-43d3-8497-f55fc49f5db6/action/action.yaml = list 4
. . execution = list 2
. . . uuid = character 1= 8614aa95-3638-4e49-88f
. . . runtime = list 3
. . . . start = character 1= 2020-02-28T10:19:02.86
. . . . end = character 1= 2020-02-28T10:19:24.46
. . . . duration = character 1= 21 seconds, and 605755
...
Reading Metadata
If
Related Skills
node-connect
341.2kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
84.5kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
341.2kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
commit-push-pr
84.5kCommit, push, and open a PR
