PLMAlign
PLMAlign utilizes per-residue embeddings as input to obtain specific alignments and more refined similarity
Install / Use
/learn @maovshao/PLMAlignREADME
PLMAlign
- 2024.6.5 Update: We have uploaded the
Dataset of PLMSearch & PLMAlignin Zenodo.
This is the implement of <b>PLMAlign</b>, a pairwise protein sequence alignment tool in "PLMSearch: Protein language model powers accurate and fast sequence search for remote homology". PLMAlign takes per-residue embeddings as input to obtain specific alignments and corresponding alignment scores.
Specifically, PLMAlign can achieve <b>local</b> and <b>global</b> alignment. The specific algorithm and parameters are similar to the SW and NW algorithms implemented by EMBL-EBI and pLM-BLAST. However, by converting a fixed substitution matrix into similarity calculated by the dot product of per-residue embeddings, PLMAlign is able to capture deep evolutionary information and perform better on remote homology protein pairs.
<div align=center><img src="example/figure/framework.png" width="100%" height="100%"/></div>Quick links
Webserver
<span id="webserver"></span>
PLMAlign web server : dmiip.sjtu.edu.cn/PLMAlign :airplane:
PLMSearch web server : dmiip.sjtu.edu.cn/PLMSearch 🚀
PLMSearch source code : github.com/maovshao/PLMSearch :helicopter:
Requirements
<span id="requirements"></span>
Follow the steps in requirements.sh
Data preparation
<span id="data-preparation"></span>
We have released our experiment data, which can be downloaded from plmalign_data or Zenodo.
# Use the following command or download it from https://zenodo.org/records/11480660
wget https://dmiip.sjtu.edu.cn/PLMAlign/static/download/plmalign_data.tar.gz
tar zxvf plmalign_data.tar.gz
Reproduce all our experiments
<span id="main"></span>
Reproduce all our experiments with good visualization by following the steps in:
- Malidup: malidup.ipynb
- Malisam: malisam.ipynb
Notice: Detailed results are saved in data/alignment_benchmark/result/.
- SCOPe40: scope40.ipynb
Notice: Detailed results are saved in data/scope40_test/output/.
Run PLMAlign locally
<span id="pipeline"></span>
- Run PLMAlign locally by following the example in pipeline.ipynb
Notice: the inputs and outputs of the example are saved in example/.
Citation
<span id="citation"></span> Liu, W., Wang, Z., You, R. et al. PLMSearch: Protein language model powers accurate and fast sequence search for remote homology. Nat Commun 15, 2775 (2024). https://doi.org/10.1038/s41467-024-46808-5
Liu, W. et al. (2025). PLMSearch and PLMAlign: Protein Language Model (PLM)-Based Homologous Protein Sequence Search and Alignment. In: KC, D.B. (eds) Large Language Models (LLMs) in Protein Bioinformatics. Methods in Molecular Biology, vol 2941. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-4623-6_14
Related Skills
product-manager-skills
21PM skill for Claude Code, Codex, Cursor, and Windsurf: diagnose SaaS metrics, critique PRDs, plan roadmaps, run discovery, and coach PM career transitions.
devplan-mcp-server
3MCP server for generating development plans, project roadmaps, and task breakdowns for Claude Code. Turn project ideas into paint-by-numbers implementation plans.
