BindCraft

alt text

Simple binder design pipeline using AlphaFold2 backpropagation, MPNN, and PyRosetta. Select your target and let the script do the rest of the work and finish once you have enough designs to order!

Preprint link for BindCraft

Note: Before posting an issue, read the complete wiki <a href="https://github.com/martinpacesa/BindCraft/wiki/De-novo-binder-design-with-BindCraft">here</a>. Issues that are covered in the wiki will be closed without an answer.

Installation

First you need to clone this repository. Replace [install_folder] with the path where you want to install it.

git clone https://github.com/martinpacesa/BindCraft [install_folder]

The navigate into your install folder using cd and run the installation code. BindCraft requires a CUDA-compatible Nvidia graphics card to run. In the cuda setting, please specify the CUDA version compatible with your graphics card, for example '11.8'. If unsure, leave blank but it's possible that the installation might select the wrong version, which will lead to errors. In pkg_manager specify whether you are using 'mamba' or 'conda', if left blank it will use 'conda' by default.

Note: This install script will install PyRosetta, which requires a license for commercial purposes. The code requires about 2 Mb of storage space, while the AlphaFold2 weights take up about 5.3 Gb.

bash install_bindcraft.sh --cuda '12.4' --pkg_manager 'conda'

Google Colab

<a href="https://colab.research.google.com/github/martinpacesa/BindCraft/blob/main/notebooks/BindCraft.ipynb"> <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/> </a> <br /> We prepared a convenient google colab notebook to test the bindcraft code functionalities. However, as the pipeline requires significant amount of GPU memory to run for larger target+binder complexes, we highly recommend to run it using a local installation and at least 32 Gb of GPU memory.

Always try to trim the input target PDB to the smallest size possible! It will significantly speed up the binder generation and minimise the GPU memory requirements.

Be ready to run at least a few hundred trajectories to see some accepted binders, for difficult targets it might even be a few thousand.

Running the script locally and explanation of settings

To run the script locally, first you need to configure your target .json file in the settings_target folder. In the json file are the following settings:

design_path         -> path where to save designs and statistics
binder_name         -> what to prefix your designed binder files with
starting_pdb        -> the path to the PDB of your target protein
chains                -> which chains to target in your protein, rest will be ignored
target_hotspot_residues   -> which position to target for binder design, for example `1,2-10` or chain specific `A1-10,B1-20` or entire chains `A`, set to null if you want AF2 to select binding site; better to select multiple target residues or a small patch to reduce search space for binder
lengths           -> range of binder lengths to design
number_of_final_designs   -> how many designs that pass all filters to aim for, script will stop if this many are reached

Then run the binder design script:

sbatch ./bindcraft.slurm --settings './settings_target/PDL1.json' --filters './settings_filters/default_filters.json' --advanced './settings_advanced/default_4stage_multimer.json'

The settings flag should point to your target .json which you set above. The filters flag points to the json where the design filters are specified (default is ./filters/default_filters.json). The advanced flag points to your advanced settings (default is ./advanced_settings/default_4stage_multimer.json). If you leave out the filters and advanced settings flags it will automatically point to the defaults.

Alternatively, if your machine does not support SLURM, you can run the code directly by activating the environment in conda and running the python code:

conda activate BindCraft
cd /path/to/bindcraft/folder/
python -u ./bindcraft.py --settings './settings_target/PDL1.json' --filters './settings_filters/default_filters.json' --advanced './settings_advanced/default_4stage_multimer.json'

We recommend to generate at least a 100 final designs passing all filters, then order the top 5-20 for experimental characterisation. If high affinity binders are required, it is better to screen more, as the ipTM metric used for ranking is not a good predictor for affinity, but has been shown to be a good binary predictor of binding.

Below are explanations for individual filters and advanced settings.

Advanced settings

Here are the advanced settings controlling the design process:

omit_AAs                        -> which amino acids to exclude from design (note: they can still occur if no other options are possible in the position)
force_reject_AA                 -> whether to force reject design if it contains any amino acids specified in omit_AAs
design_algorithm                -> which design algorithm for the trajecory to use, the currently implemented algorithms are below
use_multimer_design             -> whether to use AF2-ptm or AF2-multimer for binder design; the other model will be used for validation then
num_recycles_design             -> how many recycles of AF2 for design
num_recycles_validation         -> how many recycles of AF2 use for structure prediction and validation
sample_models = True            -> whether to randomly sample parameters from AF2 models, recommended to avoid overfitting
rm_template_seq_design          -> remove target template sequence for design (increases target flexibility)
rm_template_seq_predict         -> remove target template sequence for reprediction (increases target flexibility)
rm_template_sc_design           -> remove sidechains from target template for design
rm_template_sc_predict          -> remove sidechains from target template for reprediction
predict_initial_guess           -> Introduce bias by providing binder atom positions as a starting point for prediction. Recommended if designs fail after MPNN optimization.
predict_bigbang                 -> Introduce atom position bias into the structure module for atom initilisation. Recommended if target and design are large (more than 600 amino acids).

# Design iterations
soft_iterations                 -> number of soft iterations (all amino acids considered at all positions)
temporary_iterations            -> number of temporary iterations (softmax, most probable amino acids considered at all positions)
hard_iterations                 -> number of hard iterations (one hot encoding, single amino acids considered at all positions)
greedy_iterations               -> number of iterations to sample random mutations from PSSM that reduce loss
greedy_percentage               -> What percentage of protein length to mutate during each greedy iteration

# Design weights, higher value puts more weight on optimising the parameter.
weights_plddt                   -> Design weight - pLDDT of designed chain
weights_pae_intra               -> Design weight - PAE within designed chain
weights_pae_inter               -> Design weight - PAE between chains
weights_con_intra               -> Design weight - maximise number of contacts within designed chain
weights_con_inter               -> Design weight - maximise number of contacts between chains
intra_contact_distance          -> Cbeta-Cbeta cutoff distance for contacts within the binder
inter_contact_distance          -> Cbeta-Cbeta cutoff distance for contacts between binder and target
intra_contact_number            -> how many contacts each contact esidue should make within a chain, excluding immediate neighbours
inter_contact_number            -> how many contacts each contact residue should make between chains
weights_helicity                -> Design weight - helix propensity of the design, Default 0, negative values bias towards beta sheets
random_helicity                 -> whether to randomly sample helicity weights for trajectories, from -1 to 1

# Additional losses
use_i_ptm_loss                  -> Use i_ptm loss to optimise for interface pTM score?
weights_iptm                    -> Design weight - i_ptm between chains
use_rg_loss                     -> use radius of gyration loss?
weights_rg                      -> Design weight - radius of gyration weight for binder
use_termini_distance_loss       -> Try to minimise distance between N- and C-terminus of binder? Helpful for grafting
weights_termini_loss            -> Design weight - N- and C-terminus distance minimisation weight of binder

# MPNN settings
mpnn_fix_interface              -> whether to fix the interface designed in the starting trajectory
num_seqs                        -> number of MPNN generated sequences to sample and predict per binder
max_mpnn_sequences              -> how many maximum MPNN sequences per trajectory to save if several pass filters
sampling_temp = 0.1             -> sampling temperature for amino acids, T=0.0 means taking argmax, T>>1.0 means sampling randomly.")

# MPNN settings - advanced
backbone_noise                  -> backbone noise during sampling, 0.00-0.02 are good values
model_path                      -> path to the MPNN model weights
mpnn_weights                    -> whether to use "original" mpnn weights or "soluble" weights
save_mpnn_fasta                 -> whether to save MPNN sequences as fasta files, normally not needed as the sequence is also in the CSV file

# AF2 design settings - advanced
num_recycles_design             -> how many recycles of AF2 for design
num_recycles_validation         -> how many recycles of AF2 use for structure prediction and validation
optimise_beta

BindCraft

Install / Use

README

BindCraft

Installation

Google Colab

Running the script locally and explanation of settings

Advanced settings