CandyCrunch

Predicting glycan structure from LC-MS/MS data

Generate Convert Improve

Install / Use

/learn @BojarLab/CandyCrunch

About this skill

Quality Score

0/100

README

CandyCrunch

What is CandyCrunch?

CandyCrunch is a package for predicting glycan structure from LC-MS/MS data. It contains the CandyCrunch model, along with the rest of the inference pipeline and and downstream spectrum processing tools. These are further described in our manuscript Urban et al. (2024) – Predicting glycan structure from tandem mass spectrometry via deep learning published in Nature Methods.

Install CandyCrunch

Development version:

pip install git+https://github.com/BojarLab/CandyCrunch.git

Development version bundled with GlycoDraw:

[!NOTE]
The Operating System specific installations for GlycoDraw are still required, read more in the GlycoDraw installation guide

pip install 'CandyCrunch[draw] @ git+https://github.com/Bojarlab/CandyCrunch'

PyPI:

pip install CandyCrunch

`CandyCrunch.ipynb`

If you are looking for a convenient and easy-to-run version of the code that does not require any local installations, we have also created a Google Colaboratory notebook.
The notebook contains an example pipeline ready to run, which can be copied, executed, and customised in any way.
The example file included in the notebook is the same as in examples/ and is ready for use in the notebook workflow.

Using CandyCrunch – Command line interface:

If you would like to run our main inference function from the command line, you can do so using the candycrunch_predict command included in this repository.

Requires at a minimum:

<pre> --spectra_filepath,type=string: a filepath to an mzML/mzXML file or a .xlsx file --glycan_class, type=string: the glycan class measured ("N", "O", "lipid"/"free") --output, type=string: an output filepath ending with `.csv` or `.xlsx` </pre> <details> <summary>

Optional arguments:

</summary> <pre> --mode, type=string: mass spectrometry mode; options are 'negative' or 'positive'; default: 'negative' --modification, type=string: chemical derivatization of glycans; options are “reduced”, “permethylated”, “2AA”, “2AB” or “custom”; default:”reduced” | |--mass_tag, type=float: only if modification = "custom", mass of custom reducing end tag ; default:None --lc, type=string: type of liquid chromatography; options are 'PGC', 'C18', and 'other'; default:'PGC' --trap, type=string: type of mass detector used; options are 'linear', 'orbitrap', 'amazon', and 'other'; default:'linear' --rt_min, type=float: whether only spectra from a minimum retention time (in minutes) onward should be considered; default:0 --rt_max, type=float: whether only spectra up to a maximum retention time (in minutes) should be considered; default:0 --rt_diff, type=float: maximum retention time difference (in minutes) to peak apex that can be grouped with that peak; default:1.0 --spectra, type=float: whether to also output the actual spectra used for prediction; default:False --get_missing, type=bool: whether to also organize spectra without a matching prediction but a valid composition; default:False | |--filter_out, type=set: only if get_missing = "True", set of monosaccharide or modification types that is used to filter out compositions (e.g., if you know there is no Pen); default:{'Kdn', 'P', 'HexA', 'Pen', 'HexN', 'Me', 'PCho', 'PEtN'} --mass_tolerance, type=float: permitted variation in Da, to still consider two masses to stem from the same molecule.; default:0.5 --supplement, type=bool: whether to impute observed biosynthetic intermediaries from biosynthetic networks; default:True --experimental, type=bool: whether to impute missing predictions via database searches etc.; default:True | |--taxonomy_class, type=string: only if experimental = "True", which taxonomy class to pull glycans for populating the mass_dic for experimental=True; default:'Mammalia' --plot_glycans, type=bool: whether you want to save an output.xlsx file that contains SNFG images of all top1 predictions, will be saved in the same folder as spectra_filepath; default:False </pre> </details>

Basic usage

[!IMPORTANT]
Users must install CandyCrunch using pip before running the commands below

/Users/xurbja $ candycrunch_predict --spectra_filepath path_to_my_files/file.mzML --glycan_class 'O' --output path_to_my_outputs/output_file.csv

Using CandyCrunch – LC-MS/MS glycan annotation

`wrap_inference` (in `CandyCrunch.prediction`)

Wrapper function to predict glycan structures from raw LC-MS/MS spectra using CandyCrunch

Requires at a minimum:

<pre> - spectra_filepath, type = string: a filepath to an mzML/mzXML file or a .xlsx file - glycan_class,type = string: the glycan class measured ("N", "O", "lipid"/"free") </pre>

mzML/mzXML files are internally processed into extracted spectra. xlsx files need to be already extracted in the format as the example file in examples/.

Optional arguments:

</summary> <pre> model, type=Pytorch object: loaded from a checkpoint of a trained CandyCrunch model glycans, type=list: ordered list of glycans used to train CandyCrunch which can be predicted by the model bin_num, type=list: number of bins to separate the ms2 spectrum into frag_num, type=list: number of top fragments to show in df_out per spectrum; default:100 mode, type=string: mass spectrometry mode; options are 'negative' or 'positive'; default: 'negative' modification, type=string: chemical derivatization of glycans; options are “reduced”, “permethylated”, “2AA”, “2AB” or “custom”; default:”reduced” | |--mass_tag, type=float: only if modification = "custom", mass of custom reducing end tag ; default:None lc, type=string: type of liquid chromatography; options are 'PGC', 'C18', and 'other'; default:'PGC' trap, type=string: type of mass detector used; options are 'linear', 'orbitrap', 'amazon', and 'other'; default:'linear' rt_min, type=float: whether only spectra from a minimum retention time (in minutes) onward should be considered; default:0 rt_max, type=float: whether only spectra up to a maximum retention time (in minutes) should be considered; default:0 rt_diff, type=float: maximum retention time difference (in minutes) to peak apex that can be grouped with that peak; default:1.0 pred_thresh, type=float: prediction confidence threshold used for filtering; default:0.01 temperature, type=float: the temperature factor used to calibrate logits; default:1.15 spectra, type=float: whether to also output the actual spectra used for prediction; default:False get_missing, type=bool: whether to also organize spectra without a matching prediction but a valid composition; default:False | |--filter_out, type=set: only if get_missing = "True", set of monosaccharide or modification types that is used to filter out compositions (e.g., if you know there is no Pen); default:{'Kdn', 'P', 'HexA', 'Pen', 'HexN', 'Me', 'PCho', 'PEtN'} mass_tolerance, type=float: permitted variation in Da, to still consider two masses to stem from the same molecule.; default:0.5 extra_thresh, type=float: prediction confidence threshold at which to allow cross-class predictions (e.g., predicting N-glycans in O-glycan samples); default:0.2 supplement, type=bool: whether to impute observed biosynthetic intermediaries from biosynthetic networks; default:True experimental, type=bool: whether to impute missing predictions via database searches etc.; default:True | |--mass_dic, type=dict: only if experimental = "True", dictionary of form mass : list of glycans; will be generated internally | |--taxonomy_class, type=string: only if experimental = "True", which taxonomy class to pull glycans for populating the mass_dic for experimental=True; default:'Mammalia' | |--df_use, type=DataFrame: only if experimental = "True", sugarbase-like database of glycans with species associations etc.; default: use glycowork-stored df_glycan plot_glycans, type=bool: whether you want to save an output.xlsx file that contains SNFG images of all top1 predictions, will be saved in the same folder as spectra_filepath; default:False </pre> </details>

Basic usage

annotated_spectra_df = wrap_inference("C:/myfiles/my_spectra.mzML", glycan_class)

This is what a truncated example of `annotated_spectra_df` would look like

</summary>

Related Skills

claude-opus-4-5-migration

83.2k

Migrate prompts and code from Claude Sonnet 4.0, Sonnet 4.5, or Opus 4.1 to Opus 4.5

model-usage

337.3k

Use CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.

TrendRadar

49.8k

⭐AI-driven public opinion & trend monitor with multi-platform aggregation, RSS, and smart alerts.🎯 告别信息过载，你的 AI 舆情监控助手与热点筛选工具！聚合多平台热点 + RSS 订阅，支持关键词精准筛选。AI 智能筛选新闻 + AI 翻译 + AI 分析简报直推手机，也支持接入 MCP 架构，赋能 AI 自然语言对话分析、情感洞察与趋势预测等。支持 Docker ，数据本地/云端自持。集成微信/飞书/钉钉/Telegram/邮件/ntfy/bark/slack 等渠道智能推送。

mcp-for-beginners

15.6k

This open-source curriculum introduces the fundamentals of Model Context Protocol (MCP) through real-world, cross-language examples in .NET, Java, TypeScript, JavaScript, Rust and Python. Designed for developers, it focuses on practical techniques for building modular, scalable, and secure AI workflows from session setup to service orchestration.

BojarLab

View profile

View on GitHub

GitHub Stars34

CategoryProduct

Updated2d ago

Forks8

BojarLab/CandyCrunch

Languages

Jupyter Notebook

Security Score

95/100

Audited on Mar 24, 2026

No findings

CandyCrunch

Install / Use

README

CandyCrunch

What is CandyCrunch?

Install CandyCrunch

Development version:

Development version bundled with GlycoDraw:

PyPI:

CandyCrunch.ipynb

Using CandyCrunch – Command line interface:

Requires at a minimum:

Optional arguments:

Basic usage

Using CandyCrunch – LC-MS/MS glycan annotation

wrap_inference (in CandyCrunch.prediction) <br>

Requires at a minimum:

Optional arguments:

Basic usage

This is what a truncated example of annotated_spectra_df would look like

Related Skills

`CandyCrunch.ipynb`

`wrap_inference` (in `CandyCrunch.prediction`) <br>

This is what a truncated example of `annotated_spectra_df` would look like