DspikeIn
The importance of converting relative to absolute abundance in the context of microbial ecology: Introducing the user-friendly DspikeIn R package
Install / Use
/learn @mghotbi/DspikeInREADME
Welcome to the DspikeIn R package repository!
Author: Mitra Ghotbi
Version: 0.99.10
Date: March 8, 2025
📚 Table of Contents
-
Getting Started DspikeIn Package
-
Data Preparation
-
Processing
- Preprocessing One Species Scaling Factor
- Preprocessing List of Species Scaling Factor
- Calculate Spiked Species Retrieval % for One Species
- Calculate Spiked Species Retrieval % for List of Species
- Scaling Factors for One Spiked Species
- Scaling Factors for a List of Spiked Species
- System-Specific Spiked Species Retrieval
- Conclusion
-
Bias Correction
-
Visualization
-
Credits
DspikeIn Package
DspikeIn is designed for microbiome data analysis, seamlessly integrating with phyloseq (for marker-gene microbiome data) and TreeSummarizedExperiment (TSE) (for hierarchical biological data, including microbiomes). These objects must include seven taxonomic ranks.
For absolute abundance estimation, the metadata must contain spiked.volume.
DspikeIn accommodates either a single spike-in taxon or synthetic community taxa with variable or equal spike-in volumes and copy numbers. The package offers a comprehensive suite of tools for AA quantification, addressing challenges through ten core functions: 1) validation of spiked species, 2) data preprocessing, 3) system-specific spiked species retrieval, 4) scaling factor calculation, 5) conversion to absolute abundance, 6) bias correction and normalization, 7) performance assessment, and 8) taxa exploration and filtering 9) network topology assessment 10) further analyses and visualization.
Features of DspikeIn
The DspikeIn package provides functions for:
- Verifying the phylogenetic distances of ASVs/OTUs derived from spiked species.
- Preprocessing microbiome data.
- Calculating spike-in scaling factors.
- Converting relative abundance to absolute abundance.
- Estimating acceptable retrieval percentages of spiked species.
- Performing data transformation, differential abundance analysis, and visualization.
Vignettes
📘 Official Vignettes & Documentation
DspikeIn comes with detailed guides and examples to help you get started quickly with both Phyloseq and TreeSummarizedExperiment (TSE) formats.
Explore Online
-
Interactive Guide: DspikeIn with Phyloseq
Step-by-step usage of DspikeIn withphyloseqobjects. -
Interactive Guide: DspikeIn with TSE
Full walkthrough usingTreeSummarizedExperimentformat. -
Documentation Homepage
One-click access to all tutorials, stylesheets, and embedded visuals.
Download for Offline Use
Data availability
The DspikeIn package provides example datasets located in the data/ folder and inst/extdata/ folder. You can list the available datasets using the following commands:
# List datasets available in the DspikeIn package
data(package = "DspikeIn")
# List files in the extdata folder
list.files(system.file("extdata", package = "DspikeIn"))
Building your own phyloseq and TSE
# =====================================================================
# Build phyloseq
# =====================================================================
otu <- read.csv("otu.csv", header = TRUE, sep = ",", row.names = 1)
# taxonomic rank need to be capilalized, only the first letter of each rank
tax <- read.csv("tax.csv", header = TRUE, sep = ",", row.names = 1)
# Ensure 'spiked.volume' column is present and correctly formatted in metadata
meta <- read.csv("metadata.csv", header = TRUE, sep = ",")
# Convert data to appropriate formats
meta <- as.data.frame(meta)
taxmat <- as.matrix(tax)
otumat <- as.matrix(otu)
colnames(taxmat) <- c("Kingdom", "Phylum", "Class", "Order", "Family", "Genus", "Species")
OTU <- otu_table(otumat, taxa_are_rows = TRUE)
TAX <- phyloseq::tax_table(taxmat)
# Check
row.names(meta) <- sample_names(OTU)
metadata <- sample_data(meta)
# Build phyloseq obj
physeq <- phyloseq(OTU, TAX, metadata)
# Follow the next steps if tree and reference files are included
MyTree <- read.tree("tree.nwk")
reference_seqs <- readDNAStringSet(file = "dna-sequences.fasta", format = "fasta")
physeq_16SOTU <- merge_phyloseq(physeq, reference_seqs, MyTree)
physeq_16SOTU <- tidy_phyloseq_tse(physeq_16SOTU)
saveRDS(physeq_16SOTU, file = "physeq_16SOTU.rds")
physeq_16SOTU <- readRDS("physeq_16SOTU.rds")
# =====================================================================
# Build TSE
# =====================================================================
otu <- read.csv("otu.csv", header = TRUE, sep = ",", row.names = 1)
otu_mat <- as.matrix(otu) # Convert to matrix
tax <- read.csv("tax.csv", header = TRUE, sep = ",", row.names = 1)
colnames(tax) <- c("Kingdom", "Phylum", "Class", "Order", "Family", "Genus", "Species")
tax_mat <- as.matrix(tax) # Convert to matrix
meta <- read.csv("metadata.csv", header = TRUE, sep = ",", row.names = 1)
reference_seqs <- readDNAStringSet("dna-sequences.fasta", format = "fasta")
tse <- TreeSummarizedExperiment(
assays = list(counts = otu_mat), # OTU table
rowData = tax_mat, # Taxonomy information
colData = meta, # Sample metadata
rowTree = MyTree, # Phylogenetic tree
rowSeqs = reference_seqs # Reference sequences
)
Whole-Cell Spike-In Protocol, Tetragenococcus halophilus and Dekkera bruxellensis were selected as taxa to spike into gut microbiome samples based on our previous studies WalkerLab.
GCN Normalization with QIIME2 Plugin
Opinions on gene copy number (GCN) correction for the 16S rRNA marker vary, with proponents citing improved accuracy and critics noting limitations. While GCN correction is not included in the DspikeIn package, it can be applied to relative abundance counts using tools like the q2-gcn-norm plugin in Qiime2 (rrnDB v5.7) or methods outlined by Louca et al., 2018,including PICRUSt, CopyRighter, and PAPRICA. Due to variability in rDNA gene copy numbers (Lavrinienko et al., 2021), GCN corrections were not applied. However, targeted adjustments can be made to prevent overestimating specific fungal taxa.
Command Example
qiime gcn-norm copy-num-normalize \
--i-table table-dada2.qza \
--i-taxonomy taxonomy.qza \
--o-gcn-norm-table table-normalized.qza
Install DspikeIn Package
# Install BiocManager if not already installed
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
# Use Bioconductor’s repositories for dependencies
options(repos = BiocManager::repositories())
# Install vignette dependencies
install.packages(c("knitr", "rmarkdown"))
BiocManager::install("BiocStyle", update = FALSE)
# ---- **Option 1**: Install from Bioconductor
BiocManager::install("DspikeIn")
# ---- **Option 2**: Install development version directly from Bioconductor Git server
BiocManager::install("https://git.bioconductor.org/packages/DspikeIn")
# ---- **Option 3**: Install development version from GitHub (latest updates)
if (!requireNamespace("remotes", quietly = TRUE))
install.packages("remotes")
remotes::install_github(
"mghotbi/DspikeIn",
build_vignettes = FALSE,
dependencies = TRUE
)
#OR
remotes::install_github(
"mghotbi/DspikeIn",
build_vignettes = TRUE,
dependencies = TRUE
)
remotes::install_git("git@git.bioconductor.org:packages/DspikeIn.git",
build_vignettes = TRUE)
# ---- Load and verify installation
library(DspikeIn)
packageVersion("DspikeIn")
# ---- Access vignettes
browseVignettes("DspikeIn")
# or
vignette(package = "DspikeIn")
Acknowledgement
The development of the DspikeIn packag
