CoreMS

CoreMS is a comprehensive mass spectrometry software framework

Generate Convert Improve

Install / Use

/learn @EMSL-Computing/CoreMS

About this skill

Quality Score

0/100

README

CoreMS Logo

Introduction
Installation
- Installation
- Thermo Raw File on Mac and Linux
Execution
Sibling Projects
- EnviroMS
- MetaMS

CoreMS

CoreMS is a comprehensive mass spectrometry framework for software development and data analysis of small molecules analysis.

Data handling and software development for modern mass spectrometry (MS) is an interdisciplinary endeavor requiring skills in computational science and a deep understanding of MS. To enable scientific software development to keep pace with fast improvements in MS technology, we have developed a Python software framework named CoreMS. The goal of the framework is to provide a fundamental, high-level basis for working with all mass spectrometry data types, allowing custom workflows for data signal processing, annotation, and curation. The data structures were designed with an intuitive, mass spectrometric hierarchical structure, thus allowing organized and easy access to the data and calculations. Moreover, CoreMS supports direct access for almost all vendors’ data formats, allowing for the centralization and automation of all data processing workflows from the raw signal to data annotation and curation.

CoreMS aims to provide

logical mass spectrometric data structure
self-containing data and metadata storage
modern molecular formulae assignment algorithms
dynamic molecular search space database search and generator

Current Version

3.10.0

Main Developers/Contact

Documentation

API documentation can be found here.

Overview slides can be found here.

Contributing

As an open source project, CoreMS welcomes contributions of all forms. Before contributing, please see our Dev Guide

Data formats

Data input formats

Bruker Solarix (CompassXtract)
Bruker Solarix transients, ser and fid (FT magnitude mode only)
ThermoFisher (.raw)
Spectroswiss signal booster data-acquisition station (.hdf5)
MagLab ICR data-acquisition station (FT and magnitude mode) (.dat)
ANDI NetCDF for GC-MS (.cdf)
mzml for LC-MS (.mzml)
Generic mass list in profile and centroid mde (include all delimiters types and Excel formats)
CoreMS exported processed mass list files(excel, .csv, .txt, pandas dataframe as .pkl)
CoreMS self-containing Hierarchical Data Format (.hdf5)
Pandas Dataframe
Support for cloud Storage using s3path.S3path

Data output formats

Pandas data frame (can be saved using pickle, h5, etc)
Text Files (.csv, tab separated .txt, etc)
Microsoft Excel (xlsx)
Automatic JSON for metadata storage and reuse
Self-containing Hierarchical Data Format (.hdf5) including raw data and time-series data-point for processed data-sets with all associated metadata stored as json attributes

Data structure types

LC-MS
GC-MS
Transient
Mass Spectra
Mass Spectrum
Mass Spectral Peak
Molecular Formula

Available features

FT-MS Signal Processing, Calibration, and Molecular Formula Search and Assignment

Apodization, Zerofilling, and Magnitude mode FT
Manual and automatic noise threshold calculation
Peak picking using apex quadratic fitting
Experimental resolving power calculation
Frequency and m/z domain calibration functions:
LedFord equation
Linear equation
Quadratic equation
Automatic search most abundant Ox homologue series
Automatic local (SQLite) or external (PostgreSQL) database check, generation, and search
Automatic molecular formulae assignments algorithm for ESI(-) MS for natural organic matter analysis
Automatic fine isotopic structure calculation and search for all isotopes
Flexible Kendrick normalization base
Kendrick filter using density-based clustering
Kendrick classification
Heteroatoms classification and visualization

GC-MS Signal Processing, Calibration, and Compound Identification

Baseline detection, subtraction, smoothing
m/z based Chromatogram Peak Deconvolution,
Manual and automatic noise threshold calculation
First and second derivatives peak picking methods
Peak Area Calculation
Retention Index Calibration
Automatic local (SQLite) or external (MongoDB or PostgreSQL) database check, generation, and search
Automatic molecular match algorithm with all spectral similarity methods

High Resolution Mass Spectrum Simulations

Peak shape (Lorentz, Gaussian, Voigt, and pseudo-Voigt)
Peak fitting for peak shape definition
Peak position in function of data points, signal to noise and resolving power (Lorentz and Gaussian)
Prediction of mass error distribution
Calculated ICR Resolving Power based on magnetic field (B), and transient time(T)

LC-MS Signal Processing, Molecular Formula Search and Assignment, and Spectral Similarity Searches

See walkthrough in this notebook

Two dimensional (m/z and retention time) peak picking using persistent homology
Smoothing, cetroid detection, and integration of extracted ion chromatograms
Peak shape metric calculations including half peak height, tailing factor, and dispersity index
MS1 deconvolution of mass features
Idenfitication of <sup>13</sup>C isotopes within the mass features
Compatibility with molecular formula searching on MS1 or MS2 spectra
Spectral search capability using entropy similarity

Installation

pip install corems

By default the molecular formula database will be generated using SQLite

To use Postgresql the easiest way is to build a docker container:

docker-compose up -d

Change the url_database on MSParameters.molecular_search.url_database to: "postgresql+psycopg2://coremsappdb:coremsapppnnl@localhost:5432/coremsapp"
Set the url_database env variable COREMS_DATABASE_URL to: "postgresql+psycopg2://coremsappdb:coremsapppnnl@localhost:5432/coremsapp"

Thermo Raw File Access:

To be able to open thermo file a installation of pythonnet is needed:

Windows:
```
pip install pythonnet
```

Mac and Linux:

brew install mono
pip install pythonnet

Docker stack

Another option to use CoreMS is to run the docker stack that will start the CoreMS containers

Molecular Database and Jupyter Notebook Docker Containers

A docker container containing:

A custom python distribution will all dependencies installed
A Jupyter notebook server with workflow examples
A PostgreSQL database for the molecular formulae assignment

If you don't have docker installed, the easiest way is to install docker for desktop

Start the containers using docker-compose (easiest way):

On docker-compose-jupyter.yml there is a volume mapping for the tests_data directory with the data provided for testing, to change to your data location:
- locate the volumes on docker-compose-jupyter.yml:
```
volumes:
  - ./tests/tests_data:/home/CoreMS/data
```
- change "./tests/tests_data" to your data directory location
```
volumes:
  - path_to_your_data_directory:/home/corems/data
```
- save the file and then call:
```
docker-compose -f docker-compose-jupyter.yml up
```
Another option is to manually build the containers:
- Build the corems image:
```
docker build -t corems:local .
```
- Start the database container:
```
docker-compose up -d   
```
- Start the Jupyter Notebook:
```
docker run --rm -v ./data:/home/CoreMS/data corems:local
```
- Open your browser, copy and past the URL address provided in the terminal: http://localhost:8888/?token=<token>.
- Open the CoreMS-Tutorial.ipynb

Example for FT-ICR Data Processing

More examples can be found in the examples/notebooks directory

Basic functionality example

from corems.transient.input.brukerSolarix import ReadBrukerSolarix
from corems.molecular_id.search.molecularFormulaSearch import SearchMolecularFormulas
from corems.mass_spectrum.output.export import HighResMassSpecExport
from matplotlib import pyplot

file_path= 'tests/tests_data/ftms/ESI_NEG_SRFA.d'

# Instatiate the Bruker Solarix reader with the filepath
bruker_reader = ReadBrukerSolarix(file_path)

# Use the reader to instatiate a transient object
bruker_transient_obj = bruker_reader.get_transient()

# Calculate the transient duration time
T =  bruker_transient_obj.transient_time

# Use the transient object to instatitate a mass spectrum object
mass_spectrum_obj = bruker_transien

Related Skills

feishu-drive

337.3k

things-mac

337.3k

Manage Things 3 via the `things` CLI on macOS (add/update projects+todos via URL scheme; read/search/list from the local Things database)

clawhub

337.3k

Use the ClawHub CLI to search, install, update, and publish agent skills from clawhub.com

eval

86 agent-executable skill packs converted from RefoundAI’s Lenny skills (unofficial). Works with Codex + Claude Code.

EMSL-Computing

View profile

View on GitHub

GitHub Stars64

CategoryData

Updated10d ago

Forks40

EMSL-Computing/CoreMS

Languages

Python

Security Score

100/100

Audited on Mar 16, 2026

No findings

CoreMS

Install / Use

README

Table of Contents

CoreMS

Current Version

Main Developers/Contact

Documentation

Contributing

Data formats

Data input formats

Data output formats

Data structure types

Available features

FT-MS Signal Processing, Calibration, and Molecular Formula Search and Assignment

GC-MS Signal Processing, Calibration, and Compound Identification

High Resolution Mass Spectrum Simulations

LC-MS Signal Processing, Molecular Formula Search and Assignment, and Spectral Similarity Searches

Installation

Thermo Raw File Access:

Docker stack

Molecular Database and Jupyter Notebook Docker Containers

Example for FT-ICR Data Processing

Related Skills