SkillAgentSearch skills...

PLMAlign

PLMAlign utilizes per-residue embeddings as input to obtain specific alignments and more refined similarity

Install / Use

/learn @maovshao/PLMAlign
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

PLMAlign

  • 2024.6.5 Update: We have uploaded the Dataset of PLMSearch & PLMAlign in Zenodo.

This is the implement of <b>PLMAlign</b>, a pairwise protein sequence alignment tool in "PLMSearch: Protein language model powers accurate and fast sequence search for remote homology". PLMAlign takes per-residue embeddings as input to obtain specific alignments and corresponding alignment scores.

Specifically, PLMAlign can achieve <b>local</b> and <b>global</b> alignment. The specific algorithm and parameters are similar to the SW and NW algorithms implemented by EMBL-EBI and pLM-BLAST. However, by converting a fixed substitution matrix into similarity calculated by the dot product of per-residue embeddings, PLMAlign is able to capture deep evolutionary information and perform better on remote homology protein pairs.

<div align=center><img src="example/figure/framework.png" width="100%" height="100%"/></div>

Quick links

Webserver

<span id="webserver"></span>

PLMAlign web server : dmiip.sjtu.edu.cn/PLMAlign :airplane:

PLMSearch web server : dmiip.sjtu.edu.cn/PLMSearch 🚀

PLMSearch source code : github.com/maovshao/PLMSearch :helicopter:

Requirements

<span id="requirements"></span>

Follow the steps in requirements.sh

Data preparation

<span id="data-preparation"></span>

We have released our experiment data, which can be downloaded from plmalign_data or Zenodo.

# Use the following command or download it from https://zenodo.org/records/11480660
wget https://dmiip.sjtu.edu.cn/PLMAlign/static/download/plmalign_data.tar.gz
tar zxvf plmalign_data.tar.gz

Reproduce all our experiments

<span id="main"></span>

Reproduce all our experiments with good visualization by following the steps in:

Notice: Detailed results are saved in data/alignment_benchmark/result/.

Notice: Detailed results are saved in data/scope40_test/output/.

Run PLMAlign locally

<span id="pipeline"></span>

Notice: the inputs and outputs of the example are saved in example/.

Citation

<span id="citation"></span> Liu, W., Wang, Z., You, R. et al. PLMSearch: Protein language model powers accurate and fast sequence search for remote homology. Nat Commun 15, 2775 (2024). https://doi.org/10.1038/s41467-024-46808-5

Liu, W. et al. (2025). PLMSearch and PLMAlign: Protein Language Model (PLM)-Based Homologous Protein Sequence Search and Alignment. In: KC, D.B. (eds) Large Language Models (LLMs) in Protein Bioinformatics. Methods in Molecular Biology, vol 2941. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-4623-6_14

Related Skills

View on GitHub
GitHub Stars20
CategoryProduct
Updated6mo ago
Forks5

Languages

Jupyter Notebook

Security Score

67/100

Audited on Sep 9, 2025

No findings