SkillAgentSearch skills...

Esm

Evolutionary Scale Modeling (esm): Pretrained language models for proteins

Install / Use

/learn @facebookresearch/Esm
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

Evolutionary Scale Modeling

atlas

Update April 2023: Code for the two simultaneous preprints on protein design is now released! Code for "Language models generalize beyond natural proteins" is under examples/lm-design/. Code for "A high-level programming language for generative protein design" is under examples/protein-programming-language/.

This repository contains code and pre-trained weights for Transformer protein language models from the Meta Fundamental AI Research Protein Team (FAIR), including our state-of-the-art ESM-2 and ESMFold, as well as MSA Transformer, ESM-1v for predicting variant effects and ESM-IF1 for inverse folding. Transformer protein language models were introduced in the 2019 preprint of the paper "Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences". ESM-2 outperforms all tested single-sequence protein language models across a range of structure prediction tasks. ESMFold harnesses the ESM-2 language model to generate accurate structure predictions end to end directly from the sequence of a protein.

In November 2022, we released v0 of the ESM Metagenomic Atlas, an open atlas of 617 million predicted metagenomic protein structures. The Atlas was updated in March 2023 in collaboration with EBI. The new v2023_02 adds another 150 million predicted structures to the Atlas, as well as pre-computed ESM2 embeddings. Bulk download, blog post and the resources provided on the Atlas website are documented on this README.

In December 2022, we released two simultaneous preprints on protein design.

  • "Language models generalize beyond natural proteins" (PAPER, CODE) uses ESM2 to design de novo proteins. The code and data associated with the preprint can be found here.
  • "A high-level programming language for generative protein design" (PAPER, CODE) uses ESMFold to design proteins according to a high-level programming language.
<details><summary><b>Citation</b></summary> For ESM2, ESMFold and ESM Atlas: ```bibtex @article{lin2023evolutionary, title = {Evolutionary-scale prediction of atomic-level protein structure with a language model}, author = {Zeming Lin and Halil Akin and Roshan Rao and Brian Hie and Zhongkai Zhu and Wenting Lu and Nikita Smetanin and Robert Verkuil and Ori Kabeli and Yaniv Shmueli and Allan dos Santos Costa and Maryam Fazel-Zarandi and Tom Sercu and Salvatore Candido and Alexander Rives }, journal = {Science}, volume = {379}, number = {6637}, pages = {1123-1130}, year = {2023}, doi = {10.1126/science.ade2574}, URL = {https://www.science.org/doi/abs/10.1126/science.ade2574}, note={Earlier versions as preprint: bioRxiv 2022.07.20.500902}, } ```

For transformer protein language models:

@article{rives2021biological,
  title={Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences},
  author={Rives, Alexander and Meier, Joshua and Sercu, Tom and Goyal, Siddharth and Lin, Zeming and Liu, Jason and Guo, Demi and Ott, Myle and Zitnick, C Lawrence and Ma, Jerry and others},
  journal={Proceedings of the National Academy of Sciences},
  volume={118},
  number={15},
  pages={e2016239118},
  year={2021},
  publisher={National Acad Sciences},
  note={bioRxiv 10.1101/622803},
  doi={10.1073/pnas.2016239118},
  url={https://www.pnas.org/doi/full/10.1073/pnas.2016239118},
}
</details> <details open><summary><b>Table of contents</b></summary> </details> <details><summary><b>What's New</b></summary> </details>

Main models you should use <a name="main-models"></a>

| Shorthand | esm.pretrained. | Dataset | Description | |-----------|-----------------------------|---------|--------------| | ESM-2 | esm2_t36_3B_UR50D() esm2_t48_15B_UR50D() | UR50 (sample UR90) | SOTA general-purpose protein language model. Can be used to predict structure, function and other protein properties directly from individual sequences. Released with Lin et al. 2022 (Aug 2022 update). | | ESMFold | esmfold_v1() | PDB + UR50 | End-to-end single sequence 3D structure predictor (Nov 2022 update). | | ESM-MSA-1b| esm_msa1b_t12_100M_UR50S() | UR50 + MSA | MSA Transformer language model. Can be used to extract embeddings from an MSA. Enables SOTA inference of structure. Released with Rao et al. 2021 (ICML'21 version, June 2021). | | ESM-1v | esm1v_t33_650M_UR90S_1() ... esm1v_t33_650M_UR90S_5()| UR90 | Language model specialized for prediction of variant effects. Enables SOTA zero-shot prediction of the functional effects of sequence variations. Same architecture as ESM-1b, but trained on UniRef90. Released with Meier et al. 2021. | | ESM-IF1 | esm_if1_gvp4_t16_142M_UR50() | CATH + UR50 | Inverse folding model. Can be used to design sequences for given structures, or to predict functional effects of sequence variation for given structures. Enables SOTA fixed backbone sequence design. Released with Hsu et al. 2022. |

For a complete list of available models, with details and release notes, see Pre-trained Models.

Usage <a name="usage"></a>

Quick start <a name="quickstart"></a>

An easy way to get started is to load ESM or ESMFold through the HuggingFace transformers library, which has simplified the ESMFold dependencies and provides a standardized API and tools to work with state-of-the-art pretrained models.

Alternatively, ColabFold has integrated ESMFold so that you can easily run it directly in the browser on a Google Colab instance.

We also provide an API which you can access through curl or on the ESM Metagenomic Atlas web page.

curl -X POST --data "KVFGRCELAAAMKRHGLDNYRGYSLGNWVCAAKFESNFNTQATNRNTDGSTDYGILQINSRWWCNDGRTPGSRNLCNIPCSALLSSDITASVNCAKKIVSDGNGMNAWVAWRNRCKGTDVQAWIRGCRL" https://api.esmatlas.com/foldSequence/v1/pdb/

For ESM-MSA-1b, ESM-IF1, or any of the other models you can use the original implementation from our repo directly via the instructions below.

Getting started with this repo <a n

View on GitHub
GitHub Stars4.0k
CategoryDevelopment
Updated1h ago
Forks780

Languages

Python

Security Score

95/100

Audited on Mar 27, 2026

No findings