PDeepXL
pDeepXL: MS/MS spectrum prediction for cross-linked peptide pairs by deep learning
Install / Use
/learn @pFindStudio/PDeepXLREADME
pDeepXL: MS/MS spectrum prediction for cross-linked peptide pairs by deep learning
Table of Contents
Created by gh-md-toc
Introduction
In cross-linking mass spectrometry, identification of cross-linked peptide pairs heavily relies on similarity measurements between experimental spectra and theoretical ones. The lack of accurate ion intensities in theoretical spectra impairs the performances of search engines for cross-linked peptide pairs, especially at proteome scales. Here, we introduce pDeepXL, a deep neural network to predict MS/MS spectra of cross-linked peptide pairs. We used the transfer learning technique to train pDeepXL, facilitating the training with limited benchmark data of cross-linked peptide pairs. Test results on over ten datasets showed that pDeepXL accurately predicted spectra of both non-cleavable DSS/BS3/Leiker cross-linked peptide pairs (>80% of predicted spectra have Pearson correlation coefficients (PCCs) higher than 0.9), and cleavable DSSO/DSBU cross-linked peptide pairs (>75% of predicted spectra have PCCs higher than 0.9). Furthermore, we showed that accurate prediction was achieved for unseen datasets using an online fine-tunning technique. Finally, integrating pDeepXL into a database search engine increased the number of identified cross-linked spectra by 18% on average.
Installation
Please install pDeepXL from PyPI. During installation, all required dependencies will be installed automatically.
pip install pDeepXL
Please also download example datasets from here, which will be used in the following tutorial. There are two example datasets in the downloaded zip file, one is for non-cleavable cross-linkers DSS/Leiker (examples/non_cleavable), and the other is for cleavable cross-linkers DSSO/DSBU (examples/cleavable). For each dataset, there are 2 folders: the data folder contains 1 file with 15 cross-linked peptide pairs, and the predict_results folder contains predicted MS/MS spectra, spectra library, and the corresponding images.
Script mode
For developers, pDeepXL can be easily integrated into a new python project. Once installation, import pDeepXL using two lines:
import pDeepXL.predict
import pDeepXL.plot
Single prediction
pDeepXL.predict.predict_single
Use the function pDeepXL.predict.predict_single to predict a spectrum for a single cross-linked peptide pair.
predictions=pDeepXL.predict.predict_single(prec_charge,instrument,NCE_low,NCE_medium,NCE_high,crosslinker,seq1,mods1,linksite1,seq2,mods2,linksite2)
The arguments contain information about the input cross-linked peptide pair:
- prec_charge (int): the precursor charge of the cross-linked peptide pair. Only charges in [2+, 5+] are supported.
- instrument (str): the mass spectrometer name. Only instruments in ['QEPlus','QE','QEHF','QEHFX','Fusion','Lumos'] are supported.
- NCE_low, NCE_medium, NCE_high (floats): the low, medium, and high normalized collision energies (NCE). Only NCEs in [0.0, 100.0] are supported. If single NCE was used, please set it as NCE_medium, and set the NCE_low and NCE_high as zeros. If stepped-NCE was used, please set three NCEs accordingly.
- crosslinker (str): the cross-linker name. Only cross-linkers in ['DSS','Leiker','DSSO','DSBU'] are supported.
- seq1 (str): the first sequence.
- mods1 (dict): the modifications on the first sequence, where the key is the position (zero-based numbering) of a modification, and the value is the corresponding modification name. For example,
{3: 'Carbamidomethyl[C]'}means Carbamidomethyl modified the 4th Cys. Only modifications in ['Carbamidomethyl[C]','Oxidation[M]'] are support. - linksite1 (int): the cross-linked site of the first sequence (also zero-based numbering).
- seq2 (str): same description to seq1.
- mods2 (dict): same description to mods1.
- linksite2 (int): same description to linksite1.
Return value is a tuple containing 3 elements, where the last one is the predicted intensity matrix, which can be used to plot the predicted spectrum subsequently.
pDeepXL.plot.plot_single
Use the function pDeepXL.plot.plot_single to plot a single predicted spectrum.
pDeepXL.plot.plot_single(title,prec_charge,crosslinker,seq1,mods1,linksite1,seq2,mods2,linksite2,predictions[2],path_fig)
The arguments contain information about the input cross-linked peptide pair:
- title (str): the title of the predicted spectrum.
- prec_charge,crosslinker,seq1,mods1,linksite1,seq2,mods2,linksite2: same descriptions to those for
pDeepXL.predict.predict_single. - predictions[2] (tuple): the last element of the returned value of
pDeepXL.predict.predict_single, and the tuple contains predicted intensity matrices for the first and the second sequences. - path_fig (str): the path of the figure to be generated.
Demonstration
For example, run the following python script to predict and plot the demo non-cleavable cross-linked peptide pair (please use your local path):
# input example of a non-cleavable cross-linked peptide pair
# ecoli_enri0228_E_bin5_7ul.11740.11740.4.0.dta
prec_charge,instrument,NCE_low,NCE_medium,NCE_high,crosslinker,seq1,mods1,linksite1,seq2,mods2,linksite2=\
4,'QE',0.0,27.0,0.0,'Leiker','EISCVDSAELGKASR',{3: 'Carbamidomethyl[C]'},11,'KIIIGK',{},0
# please use your local path
path_non_clv_fig=r'/pFindStudio/pDeepXL/pDeepXL/examples/non_cleavable/predicted_non_clv_spectrum.png'
title='example of non-cleavable cross-linked spectrum'
non_clv_predictions=pDeepXL.predict.predict_single(prec_charge,instrument,NCE_low,NCE_medium,NCE_high,crosslinker,seq1,mods1,linksite1,seq2,mods2,linksite2)
pDeepXL.plot.plot_single(title,prec_charge,crosslinker,seq1,mods1,linksite1,seq2,mods2,linksite2,non_clv_predictions[2],path_non_clv_fig)
Run the following python script to predict and plot the demo cleavable cross-linked peptide pair (please use your local path):
# input example of a cleavable cross-linked peptide pair
# HEK293_FAIMS_60_70_80_Fr2.32448.32448.3.0.dta
prec_charge,instrument,NCE_low,NCE_medium,NCE_high,crosslinker,seq1,mods1,linksite1,seq2,mods2,linksite2=\
3,'Lumos',21.0,27.0,33.0,'DSSO','VLLDVKLK',{},5,'EVASAKPK',{},5
# please use your local path
path_clv_fig=r'/pFindStudio/pDeepXL/pDeepXL/examples/cleavable/predicted_clv_spectrum.png'
title='example of cleavable cross-linked spectrum'
clv_predictions=pDeepXL.predict.predict_single(prec_charge,instrument,NCE_low,NCE_medium,NCE_high,crosslinker,seq1,mods1,linksite1,seq2,mods2,linksite2)
pDeepXL.plot.plot_single(title,prec_charge,crosslinker,seq1,mods1,linksite1,seq2,mods2,linksite2,clv_predictions[2],path_clv_fig)
Batch prediction
If you want to predict spectra for many cross-linked peptide pairs, batch prediction is a better and more efficient way to do this. Before batch prediction, please prepare a data file containing all cross-linked peptide pairs you want to predict. In the data file, one line for one cross-linked peptide pair, and the column header is: title scan charge instrument NCE_low NCE_medium NCE_high crosslinker seq1 mods1 linksite1 seq2 mods2 linksite2, which is separated by the tab \t. These parameters have been described in the Single prediction section. Below is a demo table, and you can find the example non-cleavable data file from here, and the example cleavable data file from here.
|title|scan|charge|instrument|NCE_low|NCE_medium|NCE_high|crosslinker|seq1|mods1|linksite1|seq2|mods2|linksite2| |:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:| |C_Lee_011216_ymitos_WT_Gly_BS3_XL_12_R2.57721.57721.4.0.dta|57721|4|Fusion|0.0|30.0|0.0|DSS|FKYAPGTIVLYAER|{}|1|INELTLLVQKR|{}|9| |C_Lee_090916_ymitos_BS3_XL_B13_C1_13_Rep1.14188.14188.3.0.dta|14188|3|Lumos|0.0|30.0|0.0|DSS|KLEDAEGQENAASSE|{}|0|DINLLKNGK|{}|5| |ecoli_enri0302_E_bin8_7ul_re.5306.5306.3.0.dta|5306|3|QE|0.0|27.0|0.0|Leiker|LKEIIHQQMGGLR|{8: 'Oxidation[M]'}|1|KPNACK|{4: 'Carbamidomethyl[C]'}|0| |ecoli_enri0228_E_bin5_7ul.11740.11740.4.0.dta|11740|4|QE|0.0|27.0|0.0|Leiker|EISCVDSAELGKASR|{3: 'Carbamidomethyl[C]'}|11|KIIIGK|{}|0|
pDeepXL.predict.predict_batch
Use the function pDeepXL.predict.predict_batch for batch prediction.
predictions=pDeepXL.predict.predict_batch(path_data_file, is_non_cleavable)
The arguments contain information about the input data:
- path_data_file (str): the path of the data file, whose format likes the table above, and please make sure the
title
Related Skills
product-manager-skills
38PM skill for Claude Code, Codex, Cursor, and Windsurf: diagnose SaaS metrics, critique PRDs, plan roadmaps, run discovery, and coach PM career transitions.
devplan-mcp-server
3MCP server for generating development plans, project roadmaps, and task breakdowns for Claude Code. Turn project ideas into paint-by-numbers implementation plans.
