SkillAgentSearch skills...

PDeepXL

pDeepXL: MS/MS spectrum prediction for cross-linked peptide pairs by deep learning

Install / Use

/learn @pFindStudio/PDeepXL
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

pDeepXL: MS/MS spectrum prediction for cross-linked peptide pairs by deep learning

Table of Contents

Created by gh-md-toc

Introduction

In cross-linking mass spectrometry, identification of cross-linked peptide pairs heavily relies on similarity measurements between experimental spectra and theoretical ones. The lack of accurate ion intensities in theoretical spectra impairs the performances of search engines for cross-linked peptide pairs, especially at proteome scales. Here, we introduce pDeepXL, a deep neural network to predict MS/MS spectra of cross-linked peptide pairs. We used the transfer learning technique to train pDeepXL, facilitating the training with limited benchmark data of cross-linked peptide pairs. Test results on over ten datasets showed that pDeepXL accurately predicted spectra of both non-cleavable DSS/BS3/Leiker cross-linked peptide pairs (>80% of predicted spectra have Pearson correlation coefficients (PCCs) higher than 0.9), and cleavable DSSO/DSBU cross-linked peptide pairs (>75% of predicted spectra have PCCs higher than 0.9). Furthermore, we showed that accurate prediction was achieved for unseen datasets using an online fine-tunning technique. Finally, integrating pDeepXL into a database search engine increased the number of identified cross-linked spectra by 18% on average.

Installation

Please install pDeepXL from PyPI. During installation, all required dependencies will be installed automatically.

pip install pDeepXL

Please also download example datasets from here, which will be used in the following tutorial. There are two example datasets in the downloaded zip file, one is for non-cleavable cross-linkers DSS/Leiker (examples/non_cleavable), and the other is for cleavable cross-linkers DSSO/DSBU (examples/cleavable). For each dataset, there are 2 folders: the data folder contains 1 file with 15 cross-linked peptide pairs, and the predict_results folder contains predicted MS/MS spectra, spectra library, and the corresponding images.

Script mode

For developers, pDeepXL can be easily integrated into a new python project. Once installation, import pDeepXL using two lines:

import pDeepXL.predict
import pDeepXL.plot

Single prediction

pDeepXL.predict.predict_single

Use the function pDeepXL.predict.predict_single to predict a spectrum for a single cross-linked peptide pair.

predictions=pDeepXL.predict.predict_single(prec_charge,instrument,NCE_low,NCE_medium,NCE_high,crosslinker,seq1,mods1,linksite1,seq2,mods2,linksite2)

The arguments contain information about the input cross-linked peptide pair:

  • prec_charge (int): the precursor charge of the cross-linked peptide pair. Only charges in [2+, 5+] are supported.
  • instrument (str): the mass spectrometer name. Only instruments in ['QEPlus','QE','QEHF','QEHFX','Fusion','Lumos'] are supported.
  • NCE_low, NCE_medium, NCE_high (floats): the low, medium, and high normalized collision energies (NCE). Only NCEs in [0.0, 100.0] are supported. If single NCE was used, please set it as NCE_medium, and set the NCE_low and NCE_high as zeros. If stepped-NCE was used, please set three NCEs accordingly.
  • crosslinker (str): the cross-linker name. Only cross-linkers in ['DSS','Leiker','DSSO','DSBU'] are supported.
  • seq1 (str): the first sequence.
  • mods1 (dict): the modifications on the first sequence, where the key is the position (zero-based numbering) of a modification, and the value is the corresponding modification name. For example, {3: 'Carbamidomethyl[C]'} means Carbamidomethyl modified the 4th Cys. Only modifications in ['Carbamidomethyl[C]','Oxidation[M]'] are support.
  • linksite1 (int): the cross-linked site of the first sequence (also zero-based numbering).
  • seq2 (str): same description to seq1.
  • mods2 (dict): same description to mods1.
  • linksite2 (int): same description to linksite1.

Return value is a tuple containing 3 elements, where the last one is the predicted intensity matrix, which can be used to plot the predicted spectrum subsequently.

pDeepXL.plot.plot_single

Use the function pDeepXL.plot.plot_single to plot a single predicted spectrum.

pDeepXL.plot.plot_single(title,prec_charge,crosslinker,seq1,mods1,linksite1,seq2,mods2,linksite2,predictions[2],path_fig)

The arguments contain information about the input cross-linked peptide pair:

  • title (str): the title of the predicted spectrum.
  • prec_charge,crosslinker,seq1,mods1,linksite1,seq2,mods2,linksite2: same descriptions to those for pDeepXL.predict.predict_single.
  • predictions[2] (tuple): the last element of the returned value of pDeepXL.predict.predict_single, and the tuple contains predicted intensity matrices for the first and the second sequences.
  • path_fig (str): the path of the figure to be generated.

Demonstration

For example, run the following python script to predict and plot the demo non-cleavable cross-linked peptide pair (please use your local path):

# input example of a non-cleavable cross-linked peptide pair
# ecoli_enri0228_E_bin5_7ul.11740.11740.4.0.dta
prec_charge,instrument,NCE_low,NCE_medium,NCE_high,crosslinker,seq1,mods1,linksite1,seq2,mods2,linksite2=\
4,'QE',0.0,27.0,0.0,'Leiker','EISCVDSAELGKASR',{3: 'Carbamidomethyl[C]'},11,'KIIIGK',{},0
# please use your local path
path_non_clv_fig=r'/pFindStudio/pDeepXL/pDeepXL/examples/non_cleavable/predicted_non_clv_spectrum.png'
title='example of non-cleavable cross-linked spectrum'

non_clv_predictions=pDeepXL.predict.predict_single(prec_charge,instrument,NCE_low,NCE_medium,NCE_high,crosslinker,seq1,mods1,linksite1,seq2,mods2,linksite2)
pDeepXL.plot.plot_single(title,prec_charge,crosslinker,seq1,mods1,linksite1,seq2,mods2,linksite2,non_clv_predictions[2],path_non_clv_fig)

Run the following python script to predict and plot the demo cleavable cross-linked peptide pair (please use your local path):

# input example of a cleavable cross-linked peptide pair
# HEK293_FAIMS_60_70_80_Fr2.32448.32448.3.0.dta
prec_charge,instrument,NCE_low,NCE_medium,NCE_high,crosslinker,seq1,mods1,linksite1,seq2,mods2,linksite2=\
3,'Lumos',21.0,27.0,33.0,'DSSO','VLLDVKLK',{},5,'EVASAKPK',{},5
# please use your local path
path_clv_fig=r'/pFindStudio/pDeepXL/pDeepXL/examples/cleavable/predicted_clv_spectrum.png'
title='example of cleavable cross-linked spectrum'

clv_predictions=pDeepXL.predict.predict_single(prec_charge,instrument,NCE_low,NCE_medium,NCE_high,crosslinker,seq1,mods1,linksite1,seq2,mods2,linksite2)
pDeepXL.plot.plot_single(title,prec_charge,crosslinker,seq1,mods1,linksite1,seq2,mods2,linksite2,clv_predictions[2],path_clv_fig)

Batch prediction

If you want to predict spectra for many cross-linked peptide pairs, batch prediction is a better and more efficient way to do this. Before batch prediction, please prepare a data file containing all cross-linked peptide pairs you want to predict. In the data file, one line for one cross-linked peptide pair, and the column header is: title scan charge instrument NCE_low NCE_medium NCE_high crosslinker seq1 mods1 linksite1 seq2 mods2 linksite2, which is separated by the tab \t. These parameters have been described in the Single prediction section. Below is a demo table, and you can find the example non-cleavable data file from here, and the example cleavable data file from here.

|title|scan|charge|instrument|NCE_low|NCE_medium|NCE_high|crosslinker|seq1|mods1|linksite1|seq2|mods2|linksite2| |:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:| |C_Lee_011216_ymitos_WT_Gly_BS3_XL_12_R2.57721.57721.4.0.dta|57721|4|Fusion|0.0|30.0|0.0|DSS|FKYAPGTIVLYAER|{}|1|INELTLLVQKR|{}|9| |C_Lee_090916_ymitos_BS3_XL_B13_C1_13_Rep1.14188.14188.3.0.dta|14188|3|Lumos|0.0|30.0|0.0|DSS|KLEDAEGQENAASSE|{}|0|DINLLKNGK|{}|5| |ecoli_enri0302_E_bin8_7ul_re.5306.5306.3.0.dta|5306|3|QE|0.0|27.0|0.0|Leiker|LKEIIHQQMGGLR|{8: 'Oxidation[M]'}|1|KPNACK|{4: 'Carbamidomethyl[C]'}|0| |ecoli_enri0228_E_bin5_7ul.11740.11740.4.0.dta|11740|4|QE|0.0|27.0|0.0|Leiker|EISCVDSAELGKASR|{3: 'Carbamidomethyl[C]'}|11|KIIIGK|{}|0|

pDeepXL.predict.predict_batch

Use the function pDeepXL.predict.predict_batch for batch prediction.

predictions=pDeepXL.predict.predict_batch(path_data_file, is_non_cleavable)

The arguments contain information about the input data:

  • path_data_file (str): the path of the data file, whose format likes the table above, and please make sure the title

Related Skills

View on GitHub
GitHub Stars12
CategoryProduct
Updated8mo ago
Forks1

Languages

Python

Security Score

82/100

Audited on Jul 25, 2025

No findings