Welcome to the DspikeIn R package repository!

Author: Mitra Ghotbi
Version: 0.99.10 Date: March 8, 2025

CheatSheetDspikeIn

📚 Table of Contents

Getting Started DspikeIn Package
Data Preparation
Processing
Bias Correction
Visualization
- Visualization
- Detect common ASVs/OTUs
Credits
- Acknowledgements
- Citing DspikeIn

DspikeIn Package

DspikeIn is designed for microbiome data analysis, seamlessly integrating with phyloseq (for marker-gene microbiome data) and TreeSummarizedExperiment (TSE) (for hierarchical biological data, including microbiomes). These objects must include seven taxonomic ranks.
For absolute abundance estimation, the metadata must contain spiked.volume.

DspikeIn accommodates either a single spike-in taxon or synthetic community taxa with variable or equal spike-in volumes and copy numbers. The package offers a comprehensive suite of tools for AA quantification, addressing challenges through ten core functions: 1) validation of spiked species, 2) data preprocessing, 3) system-specific spiked species retrieval, 4) scaling factor calculation, 5) conversion to absolute abundance, 6) bias correction and normalization, 7) performance assessment, and 8) taxa exploration and filtering 9) network topology assessment 10) further analyses and visualization.

Features of DspikeIn

The DspikeIn package provides functions for:

Verifying the phylogenetic distances of ASVs/OTUs derived from spiked species.
Preprocessing microbiome data.
Calculating spike-in scaling factors.
Converting relative abundance to absolute abundance.
Estimating acceptable retrieval percentages of spiked species.
Performing data transformation, differential abundance analysis, and visualization.

Vignettes

📘 Official Vignettes & Documentation

DspikeIn comes with detailed guides and examples to help you get started quickly with both Phyloseq and TreeSummarizedExperiment (TSE) formats.

Explore Online

Interactive Guide: DspikeIn with Phyloseq
Step-by-step usage of DspikeIn with phyloseq objects.
Interactive Guide: DspikeIn with TSE
Full walkthrough using TreeSummarizedExperiment format.
Documentation Homepage
One-click access to all tutorials, stylesheets, and embedded visuals.

Download for Offline Use

Data availability

The DspikeIn package provides example datasets located in the data/ folder and inst/extdata/ folder. You can list the available datasets using the following commands:


# List datasets available in the DspikeIn package
data(package = "DspikeIn")

# List files in the extdata folder
list.files(system.file("extdata", package = "DspikeIn"))

Building your own phyloseq and TSE


# =====================================================================
#                     Build phyloseq 
# =====================================================================
otu <- read.csv("otu.csv", header = TRUE, sep = ",", row.names = 1)
# taxonomic rank need to be capilalized, only the first letter of each rank
tax <- read.csv("tax.csv", header = TRUE, sep = ",", row.names = 1)
# Ensure 'spiked.volume' column is present and correctly formatted in metadata
meta <- read.csv("metadata.csv", header = TRUE, sep = ",")

# Convert data to appropriate formats
meta <- as.data.frame(meta)
taxmat <- as.matrix(tax)
otumat <- as.matrix(otu)
colnames(taxmat) <- c("Kingdom", "Phylum", "Class", "Order", "Family", "Genus", "Species")
OTU <- otu_table(otumat, taxa_are_rows = TRUE)
TAX <- phyloseq::tax_table(taxmat)

# Check
row.names(meta) <- sample_names(OTU)
metadata <- sample_data(meta)
# Build phyloseq obj
physeq <- phyloseq(OTU, TAX, metadata)

# Follow the next steps if tree and reference files are included
MyTree <- read.tree("tree.nwk")
reference_seqs <- readDNAStringSet(file = "dna-sequences.fasta", format = "fasta")

physeq_16SOTU <- merge_phyloseq(physeq, reference_seqs, MyTree)
physeq_16SOTU <- tidy_phyloseq_tse(physeq_16SOTU)

saveRDS(physeq_16SOTU, file = "physeq_16SOTU.rds")
physeq_16SOTU <- readRDS("physeq_16SOTU.rds")


# =====================================================================
#                       Build TSE 
# =====================================================================

otu <- read.csv("otu.csv", header = TRUE, sep = ",", row.names = 1)
otu_mat <- as.matrix(otu)  # Convert to matrix
tax <- read.csv("tax.csv", header = TRUE, sep = ",", row.names = 1)
colnames(tax) <- c("Kingdom", "Phylum", "Class", "Order", "Family", "Genus", "Species")  
tax_mat <- as.matrix(tax)  # Convert to matrix
meta <- read.csv("metadata.csv", header = TRUE, sep = ",", row.names = 1)
reference_seqs <- readDNAStringSet("dna-sequences.fasta", format = "fasta")
tse <- TreeSummarizedExperiment(
  assays = list(counts = otu_mat),  # OTU table 
  rowData = tax_mat,                # Taxonomy information
  colData = meta,                    # Sample metadata
  rowTree = MyTree,                  # Phylogenetic tree
  rowSeqs = reference_seqs           # Reference sequences
)

Whole-Cell Spike-In Protocol, Tetragenococcus halophilus and Dekkera bruxellensis were selected as taxa to spike into gut microbiome samples based on our previous studies WalkerLab.

GCN Normalization with QIIME2 Plugin

Opinions on gene copy number (GCN) correction for the 16S rRNA marker vary, with proponents citing improved accuracy and critics noting limitations. While GCN correction is not included in the DspikeIn package, it can be applied to relative abundance counts using tools like the q2-gcn-norm plugin in Qiime2 (rrnDB v5.7) or methods outlined by Louca et al., 2018,including PICRUSt, CopyRighter, and PAPRICA. Due to variability in rDNA gene copy numbers (Lavrinienko et al., 2021), GCN corrections were not applied. However, targeted adjustments can be made to prevent overestimating specific fungal taxa.

Command Example

qiime gcn-norm copy-num-normalize \
  --i-table table-dada2.qza \
  --i-taxonomy taxonomy.qza \
  --o-gcn-norm-table table-normalized.qza

Install DspikeIn Package


# Install BiocManager if not already installed
if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

# Use Bioconductor’s repositories for dependencies
options(repos = BiocManager::repositories())

# Install vignette dependencies
install.packages(c("knitr", "rmarkdown"))
BiocManager::install("BiocStyle", update = FALSE)

# ---- **Option 1**: Install from Bioconductor 
BiocManager::install("DspikeIn")

# ---- **Option 2**: Install development version directly from Bioconductor Git server
BiocManager::install("https://git.bioconductor.org/packages/DspikeIn")

# ---- **Option 3**: Install development version from GitHub (latest updates)
if (!requireNamespace("remotes", quietly = TRUE))
    install.packages("remotes")

remotes::install_github(
  "mghotbi/DspikeIn",
  build_vignettes = FALSE,
  dependencies = TRUE
)


#OR

remotes::install_github(
  "mghotbi/DspikeIn",
  build_vignettes = TRUE,
  dependencies = TRUE
)

remotes::install_git("git@git.bioconductor.org:packages/DspikeIn.git",
                     build_vignettes = TRUE)


# ---- Load and verify installation
library(DspikeIn)
packageVersion("DspikeIn")

# ---- Access vignettes
browseVignettes("DspikeIn")
# or
vignette(package = "DspikeIn")

Acknowledgement

The development of the DspikeIn packag

DspikeIn

Install / Use

README