CLAPE
contrastive learning and pre-trained encoder for protein-ligand binding sites prediction
Install / Use
/learn @YAndrewL/CLAPEREADME
CLAPE Framework
Contact: For any questions regarding the code or data, please contact Yufan Liu at andyalbert97@gmail.com.
This repository contains the code for the CLAPE (Contrastive Learning And Pre-trained Encoder) framework for protein–ligand binding site prediction. We provide three ligand-binding tasks: protein–DNA, protein–RNA, and antibody–antigen binding site prediction. For support of small molecule binding site weights and training scripts for the CLAPE series of models, please refer to CLAPE-SMB.
Usage
Note: For comprehensive usage instructions, please refer to our CLAPE protocol in Methods in Molecular Biology.
CLAPE primarily depends on the large-scale pre-trained protein language model ProtBert, implemented using HuggingFace Transformers and PyTorch. Please install the dependencies in advance, or create a conda/mamba environment using the provided environment file. If you are using CLAPE-SMB, please also install ESM.
wget https://github.com/YAndrewL/CLAPE/blob/main/environment.yaml
conda env create -f environment.yaml
conda activate clape
1. Python Package from PyPI
We provide a Python package for predicting ligand-binding sites from protein sequences in FASTA format. Below is an example for DNA-binding site prediction:
# Download model weights and example file
wget https://github.com/YAndrewL/CLAPE/blob/main/example.fa
wget https://github.com/YAndrewL/CLAPE/blob/main/weights/DNA.pth
pip install clape # Install CLAPE from PyPI
# Example usage
from clape import Clape
model = Clape(model_path="model_path", ligand="DNA")
results = model.predict(input_file="example.fa")
You can set keep_score=True to retain the predicted scores from the model, and use switch_ligand to change the binding site prediction task.
2. Command Line Tools
A command line tool is also provided and installed with the Python package:
clape --input example.fa --output out.txt --ligand DNA --model /path/to/downloaded/model
This command loads the pre-trained models. You can specify the download directory using the --cache parameter.
Parameter descriptions:
| Parameter | Description | |-------------|--------------------------------------------------------------------------------------------------| | --help | Show the help documentation. | | --ligand | Specify the ligand for prediction: DNA, RNA, or AB (antibody). | | --threshold | Threshold for identifying binding sites (0–1, default: 0.5). | | --input | Path to the input file in FASTA format. | | --output | Path to the output file. The first two lines match the input; the third line is the prediction. | | --cache | Path for saving pre-trained parameters (default: protbert). | | --model | Path to trained backbone models. |
Citation
If you find our work helpful, please cite as follows:
CLAPE:
@article{10.1093/bib/bbad488,
author = {Liu, Yufan and Tian, Boxue},
title = "{Protein–DNA binding sites prediction based on pre-trained protein language model and contrastive learning}",
journal = {Briefings in Bioinformatics},
volume = {25},
number = {1},
pages = {bbad488},
year = {2024},
month = {01},
issn = {1477-4054},
doi = {10.1093/bib/bbad488},
url = {https://doi.org/10.1093/bib/bbad488},
eprint = {https://academic.oup.com/bib/article-pdf/25/1/bbad488/55381199/bbad488.pdf},
}
CLAPE-SMB:
@article{10.1186/s13321-024-00920-2,
author={Wang, Jue and Liu, Yufan and Tian, Boxue},
title={Protein-small molecule binding site prediction based on a pre-trained protein language model with contrastive learning},
journal={Journal of Cheminformatics},
year={2024},
month={Nov},
day={06},
volume={16},
number={1},
pages={125},
issn={1758-2946},
doi={10.1186/s13321-024-00920-2},
url={https://doi.org/10.1186/s13321-024-00920-2}
}
CLAPE protocol:
@inbook{Liu2025,
author = {Yufan Liu and Boxue Tian},
editor = {Dukka B. KC},
title = {CLAPE: Protein--Ligand Binding Site Prediction via Protein Language Models},
booktitle = {Large Language Models (LLMs) in Protein Bioinformatics},
year = {2025},
publisher = {Springer US},
address = {New York, NY},
pages = {293--311},
isbn = {978-1-0716-4623-6},
doi = {10.1007/978-1-0716-4623-6_18},
url = {https://doi.org/10.1007/978-1-0716-4623-6_18}
}
Updates
- Jul. 2025: CLAPE protocol published in Methods in Molecular Biology. See the online version.
- Nov. 2024: CLAPE-SMB published in Journal of Cheminformatics. See the online version.
- Aug. 2024: CLAPE is now available as a Python package. See clape on PyPI.
- Mar. 2024: Training code released with CLAPE-SMB. See this repo for reference.
- Jan. 2024: Our paper published in Briefings in Bioinformatics. See the online version.
Related Skills
proje
Interactive vocabulary learning platform with smart flashcards and spaced repetition for effective language acquisition.
best-practices-researcher
The most comprehensive Claude Code skills registry | Web Search: https://skills-registry-web.vercel.app
flutter-tutor
Flutter Learning Tutor Guide You are a friendly computer science tutor specializing in Flutter development. Your role is to guide the student through learning Flutter step by step, not to provide d
groundhog
398Groundhog's primary purpose is to teach people how Cursor and all these other coding agents work under the hood. If you understand how these coding assistants work from first principles, then you can drive these tools harder (or perhaps make your own!).
