AntiFold

AntiFold predicts sequences which fit into antibody variable domain structures. The tool outputs residue log-likelihoods in CSV format, and can sample sequences to a FASTA format directly. Sampled sequences show high structural agreement with experimental structures.

AntiFold is based on the ESM-IF1 model and is fine-tuned on solved and predicted antibody structures from SAbDab and OAS.

Paper: Bioinformatics Advances
Webserver: OPIG webserver
Colab:
Model: model.pt
License: BSD 3-Clause

Webserver

To try AntiFold without installing it, please see our OPIG webserver: https://opig.stats.ox.ac.uk/webapps/antifold/

Features

Antibody (+ antigen) probabilities and sequence sampling
Nanobody (+ antigen) probabilities and sequence sampling
Sampling of residues from specified IMGT regions. Nb: assumes antibody is IMGT numbered! (See --num_seq_per_target and --regions)
Supports use of AntiFold fine-tuned weights and ESM-IF1 pre-trained weights (See --esm_if1_mode)
Extraction of per-residue inverse-folding embeddings (See --extract_embeddings)
GPU and MacBook GPU (MPS) accelerated predictions

Input

Input should be either a paired variable domain structure (VH/VL) antibody or nanobody (VHH) (--nanobody_mode)
AntiFold assumes the first PDB chain is the heavy chain, and second the light chain, unless manually specified by the user (See --pdbs_csv, --heavy_chain, --light_chain options)
Antigen chains can optionally be specified. We recommend only including a single, ideally small, antigen chain. (See --pdbs_csv or --antigen_chain options)
Sequence sampling assumes PDBs have been IMGT numbered. You can find IMGT numbered PDBs on SAbDab, or re-number PDBs with ANARCI

Install and run AntiFold

Download and install from Github source (recommended - latest release)

conda create --name antifold python=3.10 -y && conda activate antifold
conda install -c conda-forge pytorch==2.2.0
git clone https://github.com/oxpig/AntiFold && cd AntiFold
pip install .

GPU only: install using environment.yml

conda env create -f environment.yml
python -m pip install .

Depending on your CUDA version you may need to change the dependency pytorch-cuda=12.1 in the environment.yml file. Detailed instructions on how to correctly install pytorch for your system can be found here

Run AntiFold (inverse-folding probabilities, sample sequences on IMGT-numbered PDBs)

# Run AntiFold on single PDB/CIF file
# Nb: Assumes first chain heavy, second chain light
python antifold/main.py \
    --pdb_file data/pdbs/6y1l_imgt.pdb

# Antibody-antigen complex
python antifold/main.py \
    --pdb_file data/antibody_antigen/3hfm.pdb \
    --heavy_chain H \
    --light_chain L \
    --antigen_chain Y

# Nanobody or single-chain
python antifold/main.py \
    --pdb_file data/nanobody/8oi2_imgt.pdb \
    --nanobody_chain B

# Folder of PDB/CIFs
# Nb: Assumes first chain heavy, second light
python antifold/main.py \
    --pdb_dir data/pdbs

# Specify chains to run in a CSV file (e.g. antibody-antigen complex)
python antifold/main.py \
    --pdb_dir data/antibody_antigen \
    --pdbs_csv data/antibody_antigen.csv

# Sample sequences 10x (paired VH/VL only)
# Note: Requires IMGT numbered PDBs (e.g. from SAbDab or numbered with ANARCI)
python antifold/main.py \
    --pdb_file data/pdbs/6y1l_imgt.pdb \
    --heavy_chain H \
    --light_chain L \
    --num_seq_per_target 10 \
    --sampling_temp "0.2" \
    --regions "CDR1 CDR2 CDR3"

# Run all chains with ESM-IF1 model weights
python antifold/main.py \
    --pdb_dir data/pdbs \
    --esm_if1_mode

Jupyter notebook

Notebook: <a href="https://github.com/oxpig/AntiFold/blob/master/notebook.ipynb">notebook.ipynb</a>

Colab:

import antifold
import antifold.main

# Load model
model = antifold.main.load_model()

# PDB directory
pdb_dir = "data/pdbs"

# Assumes first chain heavy, second chain light
pdbs_csv = antifold.main.generate_pdbs_csv(pdb_dir, max_chains=2)

# Sample from PDBs
df_logits_list = antifold.main.get_pdbs_logits(
    model=model,
    pdbs_csv_or_dataframe=pdbs_csv,
    pdb_dir=pdb_dir,
)

# Output log probabilites
df_logits_list[0]

Input parameters

Required parameters:

Input PDBs should be antibody variable domain structures (IMGT positions 1-128).

If no chains are specified, the first two chains will be assumed to be heavy light.
If custom_chain_mode is set, all (10) chains will be run.

- Option 1: PDB file (--pdb_file). We recommend specifying heavy and light chain (--heavy_chain and --light_chain)
- Option 2: PDB folder (--pdb_dir) + CSV file specifying chains (--pdbs_csv)
- Option 3: PDB folder, infer 1st chain heavy, 2nd chain light

Parameters for generating new sequences:

PDBs should be IMGT annotated for the sequence sampling regions to be valid.

- Number of sequences to generate (--num_seq_per_target)
- Region to mutate (--region) based on inverse folding probabilities. Select from list in IMGT_dict (e.g. 'CDRH1 CDRH2 CDRH3')
- Sampling temperature (--sampling_temp) controls generated sequence diversity, by scaling the inverse folding probabilities before sampling. Temperature = 1 means no change, while temperature ~ 0 only samples the most likely amino-acid at each position (acts as argmax).

Optional parameters:

- Multi-chain mode for including antigen or other chains (--custom_chain_mode)
- Extract latent representations of PDB within model (--extract_embeddings)
- Use ESM-IF1 instead of AntiFold model weights (--esm_if1_mode), enables custom_chain_mode

Example output

For example webserver output, see: https://opig.stats.ox.ac.uk/webapps/antifold/results/example_job/

Output CSV with residue log-probabilities: Residue probabilities: <a href="https://github.com/oxpig/AntiFold/blob/master/output/example_pdbs/6y1l_imgt.csv">6y1l_imgt.csv</a>

pdb_pos - PDB residue number
pdb_chain - PDB chain
aa_orig - PDB residue (e.g. 112)
aa_pred - Top predicted residue by AntiFold (i.e. argmax) for this position
pdb_posins - PDB residue number with insertion code (e.g. 112A)
perplexity - Inverse folding tolerance (higher is more tolerant) to mutations. See paper for more details.
Amino-acids - Inverse folding scores (log-likelihood) for the given position

pdb_pos,pdb_chain,aa_orig,aa_pred,pdb_posins,perplexity,A,C,D,E,F,G,H,I,K,L,M,N,P,Q,R,S,T,V,W,Y
2,H,V,M,2,1.6488,-4.9963,-6.6117,-6.3181,-6.3243,-6.7570,-4.2518,-6.7514,-5.2540,-6.8067,-5.8619,-0.0904,-6.5493,-4.8639,-6.6316,-6.3084,-5.1900,-5.0988,-3.7295,-8.0480,-7.3236
3,H,Q,Q,3,1.3889,-10.5258,-12.8463,-8.4800,-4.7630,-12.9094,-11.0924,-5.6136,-10.9870,-3.1119,-8.1113,-9.4382,-6.2246,-13.3660,-0.0701,-4.9957,-10.0301,-6.8618,-7.5810,-13.6721,-11.4157
4,H,L,L,4,1.0021,-13.3581,-12.6206,-17.5484,-12.4801,-9.8792,-13.6382,-14.8609,-13.9344,-16.4080,-0.0002,-9.2727,-16.6532,-14.0476,-12.5943,-15.4559,-16.9103,-17.0809,-10.5670,-13.5334,-13.4324
...

Output FASTA file with sampled sequences: <a href="https://github.com/oxpig/AntiFold/blob/master/output/example_pdbs/6y1l_imgt.fasta">6y1l_imgt.fasta</a>

T: Temperature used for design
score: average log-odds of residues in the sampled region
global_score: average log-odds of all residues (IMGT positions 1-128)
regions: regions selected for design
seq_recovery: # mutations / total sequence length
mutations: # mutations from original PDB sequence

>6y1l_imgt , score=0.2934, global_score=0.2934, regions=['CDR1', 'CDR2', 'CDRH3'], model_name=AntiFold, seed=42
VQLQESGPGLVKPSETLSLTCAVSGYSISSGYYWGWIRQPPGKGLEWIGSIYHSGSTYYN
PSLKSRVTISVDTSKNQFSLKLSSVTAADTAVYYCAGLTQSSHNDANWGQGTLVTVSS/V
LTQPPSVSAAPGQKVTISCSGSSSNIGNNYVSWYQQLPGTAPKRLIYDNNKRPSGIPDRF
SGSKSGTSATLGITGLQTGDEADYYCGTWDSSLNPVFGGGTKLEIKR
> T=0.20, sample=1, score=0.3930, global_score=0.1869, seq_recovery=0.8983, mutations=12
VQLQESGPGLVKPSETLSLTCAVSGASITSSYYWGWIRQPPGKGLEWIGSIYYSGSTYYN
PSLKSRVTISVDTSKNQFSLKLSSVTAADTAVYYCAGLYGSPWSNPYWGQGTLVTVSS/V
LTQPPSVSAAPGQKVTISCSGSSSNIGNNYVSWYQQLPGTAPKRLIYDNNKRPSGIPDRF
SGSKSGTSATLGITGLQTGDEADYYCGTWDSSLNPVFGGGTKLEIKR
...

Usage

usage:
    # Predict on example PDBs in folder
python antifold/main.py \
    --pdb_file data/antibody_antigen/3hfm.pdb \
    --heavy_chain H \
    --light_chain L \
    --antigen_chain Y # Optional

Predict inverse folding probabilities for antibody variable domain, and sample sequences with maintained fold.
PDB structures should be IMGT-numbered, paired heavy and light chain variable domains (positions 1-128).

For IMGT numbering PDBs use SAbDab or https://opig.stats.ox.ac.uk/webapps/sabdab-sabpred/sabpred/anarci/

options:
  -h, --help            show this help message and exit
  --pdb_file PDB_FILE   Input PDB file (for single PDB predictions)
  --heavy_chain HEAVY_CHAIN
                        Ab heavy chain (for single PDB predictions)
  --light_chain LIGHT_CHAIN
                        Ab light chain (for single PDB predictions)
  --antigen_chain ANTIGEN_CHAIN

AntiFold

Install / Use

README

AntiFold

Webserver

Features

Input

Install and run AntiFold

Download and install from Github source (recommended - latest release)

Run AntiFold (inverse-folding probabilities, sample sequences on IMGT-numbered PDBs)

Jupyter notebook

Input parameters

Example output

Usage