MetaSpectraST

An unsupervised and database-independent analysis tools for metaproteomic MS/MS data using spectrum clustering.

Generate Convert Improve

Install / Use

/learn @bravokid47/MetaSpectraST

About this skill

Quality Score

0/100

README

logo

metaSpectraST

commit

metaSpectraST is an unsupervised and database-independent analysis tools for metaproteomic MS/MS data using spectrum clustering. It clusters all experimentally observed MS/MS spectra based on their spectral similarity and create a representative consensus spectrum for each cluster by using the spectrum clustering algorithm implemented in the spectral library search engine, SpectraST.

Spectrally similar MS/MS spectra that are grouped in one spectral cluster are presumed to originate from the same peptide sequence, and therefore metaSpecraST treats them as replicate spectra and quantitatively profiles samples by counting the number (spectral count, SC) or intensity (spectral index, SIN) of replicate spectra in each spectral cluster.

The metaSpectraST spectral clusters also offer a portal to integrate and reconcile multiple peptide identification approaches, including database search, open modification search, and de novo sequencing. For each spectral cluster, sequences of raw spectra and their cosnensus spectrum assigned by different identification methods vote for the consensus peptide sequence of the spectral cluster through a heuristic reconciliation scheme and the majority rule.

With metaSpectraST you can,

Fast profile and compare the microbial communities of your sample;
Classify your metaproteomic (or proteomic) samples;
Validate biological/technical replicates;
Integrate and reconcile multiple peptide/protein identification approaches for further taxonomic or functional studies.

Installation

Dependencies

Python version >= 3.7, R version 4.1.3
SpectraST (v5.0)

SpectraST is an integral component of the Trans Proteomic Pipeline suite (TPP) of software. A compiled executable file is included here, which can be used alone without other TPP components.

We encourage users to download and install the entire TPP suite, which provides other useful functionalities such as raw data importation, automatic validation of search results, protein inference, and quantification and visualization. Please refer to the guides for TPP Linux installation, and the official download site for Windows installer.

edgeR (v3.34.0)

metaSpectraST normalizes the data using the trimmed mean of M-values (TMM) normalization method implemented in the edgeR package. edgeR is not necessary if you would like to normalize the data with other methods. Please refer to Bioconductor-edgR for further information.

To install the edgeR package, start R and enter:

if (!require("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("edgeR")

You may also need to install the limma (v3.48.1) package

if (!require("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("limma")

Installing metaSpectraST

Download or clone the repository:

git clone https://github.com/bravokid47/metaSpectraST.git

Make metaSpectraST executable by adding the directory yourpath/metaspectrast/ to the environment variable $PATH, or just copy the following line to the ~/.bashrc or ~/.bash_profile file and source the file.

export PATH="$PATH:yourpath/metaspectrast";

Quick start

Data format

metaSpectraST can perform spectral clustering from the following data formats:

mzML format
mzXML format
mgf format

Please note that mgf format is required for computing the normalized spectral index (SIN). File formats can be converted with msconvert or ThermoRawFileParser.

Modules of metaSpectraST

There are 6 individual modules in metaSpectraST. Run the following command to get explanation of the 6 modules.

metaspectrast -h

Output

>>>
_________ metaSpectraST by Hao, Chunlin _________

metaSpectraST v=0.0
Usage: metaspectrast [module]

Module:
1  cluster          Clustering MS/MS spectra and create consensus spectra
2  computesc        Spectral count-based (SC) sample profiling
3  computesin       Normalized spectral index (SIn) based sample profiling
4  normalize        Normlizing the data matrix of sample profiles (SC or SIn)
5  classify         Hierarchically clustering and classifying samples
6  reconcile        Reconciliation scheme

Each module is run separately. For example, to run the computesin module,

metaspectrast computesin -h

Output

>>>
usage: metaSpectraST_SIn.py [-h] [-s [SPTXT]] -m MGF [MGF ...]

metaSpectraST (v0.0) by Hao, Chunlin.
Compute normalized spectral index (SIn) of consensus spectra.

optional arguments:
  -h, --help        show this help message and exit
  -s [SPTXT]        consensus spectra .sptxt file, grandConsensus.sptxt by default.
  -m MGF [MGF ...]  raw spectra data sets in MGF format

Step 1: performing spectral clustering

Run the following command to perform spectral clustering:

metaspectrast cluster <path/*mzML>

Fragmentation type (ETD, HCD, CID-QTOF) of the spectra can be specified by the -i option. Default is off and the fragmentation type can be determined from the data files.

metaspectrast cluster -i HCD <path/*mzML>

When this step is done, it produces three types of output file in the working directory. The file bar.splib is the spectra library in a binary format. The bar.sptxt is a human-readable version of the bar.splib. The files bar.spidx and bar.pepidx are indices on the precursor m/z value and peptide, respectively. The file grandConsensus.sptxt is the library of consensus spectra, which will be used in the subsequent steps. A library of consensus spectra in .mgf format is also produced, named as grandConsensus.mgf.

Step 2: profiling samples

Consensus spectrum created in step 1 can be quantified by counting the number (spectral count, SC) or intensity (spectral index, SIN) of the replicate spectra (raw spectra) in the corresponding spectral cluster in the sample. Quantified consensus spectra can then be used to profile the samples.

Spectral count-based (SC) profiling

metaspectrast computesc -s <path/grandConsensus.sptxt>

When it is done, it produces two CSV files, unnorm_consensusPep_SC.csv and consensusSpec_RawSpectra_idx.csv. The file unnorm_consensusPep_SC.csv is unnormalized spectral count of consensus spectra in each sample, which can be normalized by the normalize module (see Step 3) or simply normalized by the sum of the spectral count in each data set. The file consensusSpec_RawSpectra_idx.csv is the index of the correspondence of raw spectrum and its consensus spectrum.

Normalized spectral index-based (SIN) profiling

metaspectrast computesin -s <path/grandConsensus.sptxt> -m <path/*mgf>

Note that the .mgf file has to be named the same as the the corresponding input file in Step 1.

When it is done, it produces three CSV files, unnorm_consensusPep_SI.csv, consensusPep_SIn.csv and consensusSpec_RawSpectra_idx.csv. Similar to SC profiling, the file unnorm_consensusPep_SI.csv is unnormalized spectral index of consensus spectra in each sample, which can be normalized by the normalize module (see Step 3). The file consensusPep_SIn.csv is the same file as unnorm_consensusPep_SI.csv, but normalized by the sum of the spectral index in each data set. The file consensusSpec_RawSpectra_idx.csv is the index of the correspondence of raw spectrum and its consensus spectrum.

Step 3: classifying samples and visualization

Hierarchical clustering of samples can be performed based on their SIN or SC profiles. But before that, SIN or SC profiles need to be normalized.

Normalization

Normalization of the SI<

Related Skills

feishu-drive

343.1k

things-mac

343.1k

Manage Things 3 via the `things` CLI on macOS (add/update projects+todos via URL scheme; read/search/list from the local Things database)

clawhub

343.1k

Use the ClawHub CLI to search, install, update, and publish agent skills from clawhub.com

codebase-memory-mcp

1.1k

High-performance code intelligence MCP server. Indexes codebases into a persistent knowledge graph — average repo in milliseconds. 66 languages, sub-ms queries, 99% fewer tokens. Single static binary, zero dependencies.

bravokid47

View profile

View on GitHub

GitHub Stars8

CategoryData

Updated1y ago

Forks1

bravokid47/metaSpectraST

Languages

Python

Security Score

70/100

Audited on Feb 10, 2025

No findings

MetaSpectraST

Install / Use

README

metaSpectraST

Contents

Installation

Dependencies

Installing metaSpectraST

Quick start

Data format

Modules of metaSpectraST

Step 1: performing spectral clustering

Step 2: profiling samples

Step 3: classifying samples and visualization

Normalization

Related Skills