SkillAgentSearch skills...

PTMGPT2

GPT-based protein language model for PTM site prediction

Install / Use

/learn @pallucs/PTMGPT2
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

<h1>PTMGPT2</h1> <a href="https://zenodo.org/doi/10.5281/zenodo.12655680"><img src="https://zenodo.org/badge/758635995.svg" alt="DOI"></a> <p>Here, we introduce PTMGPT2, a suite of models capable of generating tokens that signify modified protein sequences, crucial for identifying PTM sites. At the core of this platform is PROTGPT2, an autoregressive transformer model. We have adapted PROTGPT2, utilizing it as a pre-trained model, and further fine-tuned it for the spe cific task of generating classification labels for a given PTM type. Uniquely, PTMGPT2 utilizes a decoder-only architecture, which eliminates the need for a task-specific clas- sification head during training. Instead, the final layer of the decoder functions as a projection back to the vocabulary space, effectively generating the next possible token based on the learned patterns among tokens in the input prompt.</p> <h3>PTMGPT2 model and workflow</h3> <img src='PTMGPT2-workflow-model.png'></img> <h3>Download sample model for inference</h3> <p>Link - (https://nsclbio.jbnu.ac.kr/GPT_model/)</p> <p>Contact us directly at <b>palisthashrestha7@jbnu.ac.kr</b> for bulk predictions and trained models</p> <h3>PTMGPT2 Webserver</h3> <p>Link - (https://nsclbio.jbnu.ac.kr/tools/ptmgpt2/)</p> <h3>PTMGPT2 Models</h3> <p>Link - (https://doi.org/10.5281/zenodo.11371883)</p> <p>Link - (https://zenodo.org/records/11362322)</p> <h3>PTMGPT2 Datasets</h3> <p>Link - (https://doi.org/10.5281/zenodo.11377398)</p> <h3>Requirements</h3> <p>python 3.11.3 <br> transformers 4.29.2 <br> scikit-learn 1.2.2 <br> pytorch 2.0.1 <br> pytorch-cuda 11.7</p> <h3>Basic Usage</h3> <p>• Model: This folder hosts a sample model designed to predict PTM sites from given protein sequences, illustrating PTMGPT2’s application.<br> • Tokenizer: This folder contains a sample tokenizer responsible for tokenizing protein sequences, including handcrafted tokens for specific amino acids or motifs.<br> • Inference.ipynb: This file provides executable code for applying PTMGPT2 model and tokenizer to predict PTM sites, serving as a practical guide for users to apply the model to their datasets.</p>
View on GitHub
GitHub Stars19
CategoryEducation
Updated1mo ago
Forks3

Languages

Jupyter Notebook

Security Score

95/100

Audited on Feb 11, 2026

No findings