SkillAgentSearch skills...

MALDIvs16S

Whole-cell MALDI-TOF MS versus 16S rRNA gene analysis in dereplication and identification of recurrent bacterial isolates

Install / Use

/learn @strejcem/MALDIvs16S
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

Whole-cell MALDI-TOF MS versus 16S rRNA gene analysis for identification and dereplication of recurrent bacterial isolates

This is an R code used for MS data manipulation and analyses done in Strejcek et al., Front. Microbiol., 2018.

Data description

49 bacterial isolates obtained from various environmental samples were analysed by whole-cell MALDI-TOF MS and 16S rRNA gene sequencing.

  • Each bacterial culture was measured by MALDI-TOF MS in triplicates and the whole process was repeated 4x times on differently inoculated cultures. Mass spectra were subjected to Bruker BioTyper microbial identification.
  • 16S rRNA gene sequences were uploaded to EzBioCloud server (https://www.ezbiocloud.net/). Using Identify service, each culture was taxonomically classified and the closest type strain was identified.
  • Several R functions were created for this project and are accessible from 'MALDIbacteria.R' file.

Files:

Required packages

CRAN packages

install.packages( c("coop", "XML", "here", "sda", "crossval", "tidyverse", "reshape2" "cowplot" ) )

Bioconductor packages

source("https://bioconductor.org/biocLite.R")

biocLite( c("MALDIquant", "MALDIquantForeign", "DECIPHER", "BiocParallel" ) )

Import MS data and metadata

The data can be downloaded here.

library(here)
## here() starts at D:/R projects/MALDIvs16S
library(tidyverse)
## -- Attaching packages --------------------------------------------------------------------------------- tidyverse 1.2.1 --

## <U+221A> ggplot2 3.0.0     <U+221A> purrr   0.2.5
## <U+221A> tibble  1.4.2     <U+221A> dplyr   0.7.6
## <U+221A> tidyr   0.8.1     <U+221A> stringr 1.3.1
## <U+221A> readr   1.1.1     <U+221A> forcats 0.3.0

## -- Conflicts ------------------------------------------------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(cowplot)
## 
## Attaching package: 'cowplot'

## The following object is masked from 'package:ggplot2':
## 
##     ggsave
# load convenience R functions for MS data manipulations
source(here("MALDIbacteria.R"))
## Loading required package: MALDIquant

## 
## This is MALDIquant version 1.18
## Quantitative Analysis of Mass Spectrometry Data
##  See '?MALDIquant' for more information about this package.

## Loading required package: MALDIquantForeign

## Loading required package: XML

## Loading required package: coop

## v0.986 (2018-21-Mar)
## import MS and Bruker Biotyper data
# Ms spectra from MzMl format

suppressMessages(
  s <- importMzMl(path = here("mzML"))
)

# assign name to spectra from file names
fullNames <- metaParam(s, "file")
fullNames <- str_replace(fullNames, pattern = "^.*\\\\(.*)\\.mzML$", "\\1")
s <- chngMetaParam(s, "fullName", fullNames)

# sort spectra by their full names
s <- s[order(metaParam(s, "fullName"))]

# read MS spectra metadata with BioTyper identification data
metadata <- read_tsv(here("metadata.tsv"))
## Parsed with column specification:
## cols(
##   fullName = col_character(),
##   culture = col_character(),
##   bioRep = col_integer(),
##   techRep = col_integer(),
##   taxonomy = col_character(),
##   taxGenus = col_character(),
##   taxSpecies = col_character(),
##   scoreValue = col_double(),
##   replicate = col_integer(),
##   preparation = col_character(),
##   day = col_character()
## )
# incorporate metadata into 'MassSpectrum' objetcs
identical(metadata$fullName, metaParam(s, "fullName"))
## [1] TRUE
for (param in names(metadata)) {
  s <- chngMetaParam(s, param, pull(metadata, param))
}

screen spectra for outliers

! For novel approach to identify spectra outliers look at MALDIrppa R package

Process MS data with convenience function 'classicMaldi()' that wraps the suggested workflow in MALDIquant package (See the manuscript for more details). The default parameters of 'classicMaldi()' are set to the optimized values, e.g. MS range is trimmed to 4000-10000. In the manuscript, the first processing step was to look for significantly different technical replicates. Therefore, we change here the mass range back to full range of 2000-20000.

The 'classicMaldi()' computes several things at once and results are saved in individual sublist of the output object:

  • 'fm' - Feature matrix. Rows are samples/spectra and columns are detected protein signals.
  • 'd' - Cosine distance matrix computed from 'fm'.
  • 'hc' - Hierarchical average-linkage clustering (UPGMA) based on 'd' calculated by 'hclust()'.
  • 'ACSreps' - Average cosine similarity of each spectrum to its replicates. Replicates, here technical, are defined by 'replicate' metadata attribute stored in each mass spectrum object, where all replicates have the same identifier. See MALDIquant documentation for more details on metadata attributes.

Note that the negative intensity warnings are from baseline correction step and can be ignored.

# process MS data with convenience function 'classicMaldi()' that wraps the whole
MS <- classicMALDI(s, range = c(2000, 20000))
## Warning in .replaceNegativeIntensityValues(object): Negative intensity
## values are replaced by zeros.
# to print mass dedrogram of all mass spectra (Supplementary Figure 1) you can use PDFplot() function.
# You can use plot() arguments to modify the dendrogram properties, e.g. use cex=0.5 for smaller labels.
PDFplot(MS$hc, file = "dendrogram.pdf", cutoffs = c(1-0.79, 1-0.92))

# plot distribution of average cosine similarity of each spectrum to its technical replicates
qplot(MS$ACSreps, geom = "histogram", bins = 100)

We can see that the majority of the average cosine similarity of single spectra to its technical replicates have values > 0.9 with 3 notable outliers with values <0.85. We discard the outliers.

s <- s[-(which(MS$ACSreps < 0.85))]

Identify phylotype predicting protein signals by Shrinkage Discriminant Analysis

The 16S rRNA gene sequences of individual cultures were used for taxonomy classification and finding the closest type strain using EzTaxon Identify service (https://www.ezbiocloud.net/). All MS spectra were grouped by the closest type strain identification.

EzT <- read_tsv(here("EzTaxon.tsv"))
## Parsed with column specification:
## cols(
##   culture = col_character(),
##   phylum = col_character(),
##   order = col_character(),
##   class = col_character(),
##   family = col_character(),
##   genus = col_character(),
##   species = col_character()
## )
species <- plyr::mapvalues(metaParam(s, "culture"),
                           from = EzT$culture,
                           to = EzT$species
                           )

To identify species-level phylotype-predicting mass signals, shrinkage discriminant analysis with correlation-adjusted t-score variable selection (Ahdesmaki and Strimmer, 2010) as implemented in the sda R package (Ahdesmaki et al., 2015) was carried out. The signals were detected in the whole 2 to 20 kDa mass range. All peaks were ranked on a mutual information entropy basis, and selection was controlled by the false non-discovery rate. All peaks with a local false discovery rate of less than 0.2 were selected as phylotype predictors. Prediction accuracy was estimated using 10 × 10-fold cross validation of all MS data with the aid of the crossval R package (Strimmer, 2015) as described in sda documentation.

library(sda)
## Loading required package: entropy

## Loading required package: corpcor

## Loading required package: fdrtool
MS <- classicMALDI(s, range = c(2000, 20000))
ldar <- sda.ranking(Xtrain=MS$fm, L = as.factor(species),
                    ranking.score = "entropy", fdr = TRUE, diagonal = F)
## Computing cat scores (centroid vs. pooled mean) for feature ranking
## 
## Number of variables: 1101 
## Number of observations: 585 
## Number of classes: 43 
## 
## Estimating optimal shrinkage intensity lambda.freq (frequencies): 0.5593 
## Estimating variances (pooled across classes)
## Estimating optimal shrinkage intensity lambda.var (variance vector): 0.0561 
## 
## Computing the square root of the inverse pooled correlation matrix
## Estimating optimal shrinkage intensity lambda (correlation matrix): 0.1037 
## 
## Computing false discovery rates and higher cricitism scores for each feature
# number of p

Related Skills

View on GitHub
GitHub Stars5
CategoryDevelopment
Updated3mo ago
Forks0

Languages

R

Security Score

82/100

Audited on Dec 20, 2025

No findings