Sapiens
Sapiens is a human antibody language model based on BERT.
Install / Use
/learn @Merck/SapiensREADME
Sapiens: Human antibody language model
____ _
/ ___| __ _ _ __ (_) ___ _ __ ___
\___ \ / _` | '_ \| |/ _ \ '_ \/ __|
___| | |_| | |_| | | __/ | | \__ \
|____/ \__,_| __/|_|\___|_| |_|___/
|_|
<p>
<img src="https://github.com/Merck/Sapiens/actions/workflows/python-package-conda.yml/badge.svg"
alt="Build & Test"></a>
<a href="https://pypi.org/project/sapiens/">
<img src="https://img.shields.io/pypi/dm/sapiens"
alt="Pip Install"></a>
<a href="https://github.com/Merck/Sapiens/releases">
<img src="https://img.shields.io/pypi/v/sapiens"
alt="Latest release"></a>
<a href="https://huggingface.co/spaces/prihodad/biophi-sapiens1">
<img src="https://img.shields.io/badge/🤗%20Spaces-prihodad/biophi--sapiens1-blue"
alt="Hugging Face Spaces"></a>
</p>
Sapiens is a human antibody language model based on BERT.
Learn more in the Sapiens, OASis and BioPhi in our publication:
David Prihoda, Jad Maamary, Andrew Waight, Veronica Juan, Laurence Fayadat-Dilman, Daniel Svozil & Danny A. Bitton (2022) BioPhi: A platform for antibody design, humanization, and humanness evaluation based on natural antibody repertoires and deep learning, mAbs, 14:1, DOI: https://doi.org/10.1080/19420862.2021.2020203
For more information about BioPhi, see the BioPhi repository
For interpretable humanness evaluation based on peptide content, see Promb repository
Features
- Scoring antibody humanness and guiding design of human-like antibodies
- Suggesting humanizing mutations to antibodies (in frameworks as well as CDRs)
- Infilling missing residues in human antibody sequences
- Creating vector representations (embeddings) of residues or sequences
- Models are stored on HuggingFace (VH, VL, tokenizer)

Usage
Try out Sapiens in the HuggingFace Space or see the Jupyter Notebooks.
Install Sapiens using pip:
# Recommended: Create dedicated conda environment
conda create -n sapiens python=3.10
conda activate sapiens
# Install Sapiens
pip install sapiens
Antibody sequence infilling
Positions marked with * or X will be infilled with the most likely human residues, given the rest of the sequence
import sapiens
# Note that you can use masks (* or X) but you can also use "single-pass" prediction without any mask tokens
best = sapiens.predict_masked(
'**QLV*SGVEVKKPGASVKVSCKASGYTFTNYYMYWVRQAPGQGLEWMGGINPSNGGTNFNEKFKNRVTLTTDSSTTTAYMELKSLQFDDTAVYYCARRDYRFDMGFDYWGQGTTVTVSS',
'H'
)
print(best)
# QVQLVQSGVEVKKPGASVKVSCKASGYTFTNYYMYWVRQAPGQGLEWMGGINPSNGGTNFNEKFKNRVTLTTDSSTTTAYMELKSLQFDDTAVYYCARRDYRFDMGFDYWGQGTTVTVSS
Suggesting mutations
Return residue scores for a given sequence:
import sapiens
# Note that you can use masks (* or X) but you can also use "single-pass" prediction without any mask tokens
scores = sapiens.predict_scores(
'**QLV*SGVEVKKPGASVKVSCKASGYTFTNYYMYWVRQAPGQGLEWMGGINPSNGGTNFNEKFKNRVTLTTDSSTTTAYMELKSLQFDDTAVYYCARRDYRFDMGFDYWGQGTTVTVSS',
'H'
)
scores.head()
# A C D E ...
# 0 0.003272 0.004147 0.004011 0.004590 ... <- based on masked input
# 1 0.012038 0.003854 0.006803 0.008174 ... <- based on masked input
# 2 0.003384 0.003895 0.003726 0.004068 ... <- based on Q input
# 3 0.004612 0.005325 0.004443 0.004641 ... <- based on L input
# 4 0.005519 0.003664 0.003555 0.005269 ... <- based on V input
#
# Scores are given both for residues that are masked and that are present.
# When inputting a non-human antibody sequence, the output scores can be used for humanization.
Antibody sequence embedding
Get a vector representation of each position in a sequence
import sapiens
residue_embed = sapiens.predict_residue_embedding(
'QVKLQESGAELARPGASVKLSCKASGYTFTNYWMQWVKQRPGQGLDWIGAIYPGDGNTRYTHKFKGKATLTADKSSSTAYMQLSSLASEDSGVYYCARGEGNYAWFAYWGQGTTVTVSS',
'H',
layer=None
)
residue_embed.shape
# (layer, position in sequence, features)
# (5, 119, 128)
Get a single vector for each sequence
seq_embed = sapiens.predict_sequence_embedding(
'QVKLQESGAELARPGASVKLSCKASGYTFTNYWMQWVKQRPGQGLDWIGAIYPGDGNTRYTHKFKGKATLTADKSSSTAYMQLSSLASEDSGVYYCARGEGNYAWFAYWGQGTTVTVSS',
'H',
layer=None
)
seq_embed.shape
# (layer, features)
# (5, 128)
Notebooks
Try out Sapiens in your browser using these example notebooks:
<table> <tr><th>Links</th><th>Notebook</th><th>Description</th></tr> <tr> <td> <a href="https://mybinder.org/v2/gh/Merck/Sapiens/main?labpath=notebooks%2F01_sapiens_antibody_infilling.ipynb"><img src="https://mybinder.org/badge_logo.svg" /></a> </td> <td><a href="notebooks/01_sapiens_antibody_infilling.ipynb">01_sapiens_antibody_infilling</a></td> <td>Predict missing positions in an antibody sequence</td> </tr> <tr> <td> <a href="https://mybinder.org/v2/gh/Merck/Sapiens/main?labpath=notebooks%2F02_sapiens_antibody_embedding.ipynb"><img src="https://mybinder.org/badge_logo.svg" /></a> </td> <td><a href="notebooks/02_sapiens_antibody_embedding.ipynb">02_sapiens_antibody_embedding</a></td> <td>Get vector representations and visualize them using t-SNE</td> </tr> <tr> <td> <a href="https://mybinder.org/v2/gh/Merck/Sapiens/main?labpath=notebooks%2F03_sapiens_antibody_vh_mlm_finetuning.ipynb"><img src="https://mybinder.org/badge_logo.svg" /></a> </td> <td><a href="notebooks/03_sapiens_antibody_vh_mlm_finetuning.ipynb">03_sapiens_antibody_vh_mlm_finetuning.ipynb</a></td> <td>Finetune on a custom pool of sequences and suggest mutations</td> </tr> </table>Acknowledgements
Sapiens is based on antibody repertoires from the Observed Antibody Space:
Kovaltsuk, A., Leem, J., Kelm, S., Snowden, J., Deane, C. M., & Krawczyk, K. (2018). Observed Antibody Space: A Resource for Data Mining Next-Generation Sequencing of Antibody Repertoires. The Journal of Immunology, 201(8), 2502–2509. https://doi.org/10.4049/jimmunol.1800708
Related Skills
node-connect
349.0kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
109.4kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
349.0kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
349.0kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
