CatELMo
catELMo: Context-Aware Amino Acid Embedding Advances Analysis of TCR-Epitope Interactions
Install / Use
/learn @Lee-CBG/CatELMoREADME
Context-Aware Amino Acid Embedding Advances Analysis of TCR-Epitope Interactions
catELMo is a bi-directional amino acid embedding model that learns contextualized amino acid representations, treating an amino acid as a word and a sequence as a sentence. It learns patterns of amino acid sequences with its self-supervision signal, by predicting each the next amino acid token given its previous tokens. It has been trained on 4,173,895 TCR $\beta$ CDR3 sequences (52 million of amino acid tokens) from ImmunoSEQ. catELMo yields a real-valued representation vector for a sequence of amino acids, which can be used as input features of various downstream tasks. This is the official implementation of catELMo. <br/> <br/>
<p align="center"><img width=100% alt="Overview" src="https://github.com/Lee-CBG/catELMo/blob/main/figures/Fig4_Methods.png"></p>Publication
<b>Context-Aware Amino Acid Embedding Advances Analysis of TCR-Epitope Interactions </b> <br/> Pengfei Zhang<sup>1,2</sup>, Michael Cai<sup>1,2</sup>, Seojin Bang<sup>2</sup>, Heewook Lee<sup>1,2</sup><br/> <sup>1 </sup>School of Computing and Augmented Intelligence, Arizona State University, <sup>2 </sup>Biodesign Institute, Arizona State University <br/> Published in: eLife, 2023.
Paper | Code | Poster | Slides | Presentation (YouTube)
Dependencies
- Linux
- Python 3.6.13
- Keras 2.6.0
- TensorFlow 2.6.0
Steps to train a Binding Affinity Prediction model for TCR-epitope pairs.
1. Clone the repository
git clone https://github.com/Lee-CBG/catELMo
cd catELMo/
conda create --name bap python=3.6.13
pip install pandas==1.1.5 tensorflow==2.6.0 keras==2.6.0 scikit-learn==0.24.2 tqdm
source activate bap
2. Prepare TCR-epitope pairs for training and testing
- Download training and testing data from
datasetsfolder. - Obtain embeddings for TCR and epitopes following instructions from
embeddersfolder.
3. Train and test models
An example for epitope split
python -W ignore bap.py \
--embedding catELMo_4_layers_1024 \
--split epitope \
--gpu 0 \
--fraction 1 \
--seed 42
Citation
If you use this code or use our catELMo for your research, please cite our paper:
@article {catelmobiorxiv,
author = {Pengfei Zhang and Seojin Bang and Michael Cai and Heewook Lee},
title = {Context-Aware Amino Acid Embedding Advances Analysis of TCR-Epitope Interactions},
elocation-id = {2023.04.12.536635},
year = {2023},
doi = {10.1101/2023.04.12.536635},
publisher = {Cold Spring Harbor Laboratory},
journal = {bioRxiv}
}
License
<a rel="license" href="http://creativecommons.org/licenses/by-nc-nd/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-nc-nd/4.0/88x31.png" /></a><br />This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-nc-nd/4.0/">Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License</a>.
Related Skills
node-connect
344.4kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
99.2kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
344.4kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
344.4kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
