LigandMPNN
No description available
Install / Use
/learn @dauparas/LigandMPNNREADME
LigandMPNN
This package provides inference code for LigandMPNN & ProteinMPNN models. The code and model parameters are available under the MIT license.
Third party code: side chain packing uses helper functions from Openfold.
Running the code
git clone https://github.com/dauparas/LigandMPNN.git
cd LigandMPNN
bash get_model_params.sh "./model_params"
#setup your conda/or other environment
#conda create -n ligandmpnn_env python=3.11
#pip3 install -r requirements.txt
python run.py \
--seed 111 \
--pdb_path "./inputs/1BC8.pdb" \
--out_folder "./outputs/default"
Dependencies
To run the model you will need to have Python>=3.0, PyTorch, Numpy installed, and to read/write PDB files you will need Prody.
For example to make a new conda environment for LigandMPNN run:
conda create -n ligandmpnn_env python=3.11
pip3 install -r requirements.txt
Main differences compared with ProteinMPNN code
- Input PDBs are parsed using Prody preserving protein residue indices, chain letters, and insertion codes. If there are missing residues in the input structure the output fasta file won't have added
Xto fill the gaps. The script outputs .fasta and .pdb files. It's recommended to use .pdb files since they will hold information about chain letters and residue indices. - Adding bias, fixing residues, and selecting residues to be redesigned now can be done using residue indices directly, e.g. A23 (means chain A residue with index 23), B42D (chain B, residue 42, insertion code D).
- Model writes to fasta files:
overall_confidence,ligand_confidencewhich reflect the average confidence/probability (with T=1.0) over the redesigned residuesoverall_confidence=exp[-mean_over_residues(log_probs)]. Higher numbers mean the model is more confident about that sequence. min_value=0.0; max_value=1.0. Sequence recovery with respect to the input sequence is calculated only over the redesigned residues.
Model parameters
To download model parameters run:
bash get_model_params.sh "./model_params"
Available models
To run the model of your choice specify --model_type and optionally the model checkpoint path. Available models:
- ProteinMPNN
--model_type "protein_mpnn"
--checkpoint_protein_mpnn "./model_params/proteinmpnn_v_48_002.pt" #noised with 0.02A Gaussian noise
--checkpoint_protein_mpnn "./model_params/proteinmpnn_v_48_010.pt" #noised with 0.10A Gaussian noise
--checkpoint_protein_mpnn "./model_params/proteinmpnn_v_48_020.pt" #noised with 0.20A Gaussian noise
--checkpoint_protein_mpnn "./model_params/proteinmpnn_v_48_030.pt" #noised with 0.30A Gaussian noise
- LigandMPNN
--model_type "ligand_mpnn"
--checkpoint_ligand_mpnn "./model_params/ligandmpnn_v_32_005_25.pt" #noised with 0.05A Gaussian noise
--checkpoint_ligand_mpnn "./model_params/ligandmpnn_v_32_010_25.pt" #noised with 0.10A Gaussian noise
--checkpoint_ligand_mpnn "./model_params/ligandmpnn_v_32_020_25.pt" #noised with 0.20A Gaussian noise
--checkpoint_ligand_mpnn "./model_params/ligandmpnn_v_32_030_25.pt" #noised with 0.30A Gaussian noise
- SolubleMPNN
--model_type "soluble_mpnn"
--checkpoint_soluble_mpnn "./model_params/solublempnn_v_48_002.pt" #noised with 0.02A Gaussian noise
--checkpoint_soluble_mpnn "./model_params/solublempnn_v_48_010.pt" #noised with 0.10A Gaussian noise
--checkpoint_soluble_mpnn "./model_params/solublempnn_v_48_020.pt" #noised with 0.20A Gaussian noise
--checkpoint_soluble_mpnn "./model_params/solublempnn_v_48_030.pt" #noised with 0.30A Gaussian noise
- ProteinMPNN with global membrane label
--model_type "global_label_membrane_mpnn"
--checkpoint_global_label_membrane_mpnn "./model_params/global_label_membrane_mpnn_v_48_020.pt" #noised with 0.20A Gaussian noise
- ProteinMPNN with per residue membrane label
--model_type "per_residue_label_membrane_mpnn"
--checkpoint_per_residue_label_membrane_mpnn "./model_params/per_residue_label_membrane_mpnn_v_48_020.pt" #noised with 0.20A Gaussian noise
- Side chain packing model
--checkpoint_path_sc "./model_params/ligandmpnn_sc_v_32_002_16.pt"
Design examples
1 default
Default settings will run ProteinMPNN.
python run.py \
--seed 111 \
--pdb_path "./inputs/1BC8.pdb" \
--out_folder "./outputs/default"
2 --temperature
--temperature 0.05 Change sampling temperature (higher temperature gives more sequence diversity).
python run.py \
--seed 111 \
--pdb_path "./inputs/1BC8.pdb" \
--temperature 0.05 \
--out_folder "./outputs/temperature"
3 --seed
--seed Not selecting a seed will run with a random seed. Running this multiple times will give different results.
python run.py \
--pdb_path "./inputs/1BC8.pdb" \
--out_folder "./outputs/random_seed"
4 --verbose
--verbose 0 Do not print any statements.
python run.py \
--seed 111 \
--verbose 0 \
--pdb_path "./inputs/1BC8.pdb" \
--out_folder "./outputs/verbose"
5 --save_stats
--save_stats 1 Save sequence design statistics.
#['generated_sequences', 'sampling_probs', 'log_probs', 'decoding_order', 'native_sequence', 'mask', 'chain_mask', 'seed', 'temperature']
python run.py \
--seed 111 \
--pdb_path "./inputs/1BC8.pdb" \
--out_folder "./outputs/save_stats" \
--save_stats 1
6 --fixed_residues
--fixed_residues Fixing specific amino acids. This example fixes the first 10 residues in chain C and adds global bias towards A (alanine). The output should have all alanines except the first 10 residues should be the same as in the input sequence since those are fixed.
python run.py \
--seed 111 \
--pdb_path "./inputs/1BC8.pdb" \
--out_folder "./outputs/fix_residues" \
--fixed_residues "C1 C2 C3 C4 C5 C6 C7 C8 C9 C10" \
--bias_AA "A:10.0"
7 --redesigned_residues
--redesigned_residues Specifying which residues need to be designed. This example redesigns the first 10 residues while fixing everything else.
python run.py \
--seed 111 \
--pdb_path "./inputs/1BC8.pdb" \
--out_folder "./outputs/redesign_residues" \
--redesigned_residues "C1 C2 C3 C4 C5 C6 C7 C8 C9 C10" \
--bias_AA "A:10.0"
8 --number_of_batches
Design 15 sequences; with batch size 3 (can be 1 when using CPUs) and the number of batches 5.
python run.py \
--seed 111 \
--pdb_path "./inputs/1BC8.pdb" \
--out_folder "./outputs/batch_size" \
--batch_size 3 \
--number_of_batches 5
9 --bias_AA
Global amino acid bias. In this example, output sequences are biased towards W, P, C and away from A.
python run.py \
--seed 111 \
--pdb_path "./inputs/1BC8.pdb" \
--bias_AA "W:3.0,P:3.0,C:3.0,A:-3.0" \
--out_folder "./outputs/global_bias"
10 --bias_AA_per_residue
Specify per residue amino acid bias, e.g. make residues C1, C3, C5, and C7 to be prolines.
# {
# "C1": {"G": -0.3, "C": -2.0, "P": 10.8},
# "C3": {"P": 10.0},
# "C5": {"G": -1.3, "P": 10.0},
# "C7": {"G": -1.3, "P": 10.0}
# }
python run.py \
--seed 111 \
--pdb_path "./inputs/1BC8.pdb" \
--bias_AA_per_residue "./inputs/bias_AA_per_residue.json" \
--out_folder "./outputs/per_residue_bias"
11 --omit_AA
Global amino acid restrictions. This is equivalent to using --bias_AA and setting bias to be a large negative number. The output should be just made of E, K, A.
python run.py \
--seed 111 \
--pdb_path "./inputs/1BC8.pdb" \
--omit_AA "CDFGHILMNPQRSTVWY" \
--out_folder "./outputs/global_omit"
12 --omit_AA_per_residue
Per residue amino acid restrictions.
# {
# "C1": "ACDEFGHIKLMNPQRSTVW",
# "C3": "ACDEFGHIKLMNPQRSTVW",
# "C5": "ACDEFGHIKLMNPQRSTVW",
# "C7": "ACDEFGHIKLMNPQRSTVW"
# }
python run.py \
--seed 111 \
--pdb_path "./inputs/1BC8.pdb" \
--omit_AA_per_residue "./inputs/omit_AA_per_residue.json" \
--out_folder "./outputs/per_residue_omit"
13 --symmetry_residues
13 --symmetry_weights
Designing sequences with symmetry, e.g. homooligomer/2-state proteins, etc. In this example make C1=C2=C3, also C4=C5, and C6=C7.
#total_logits += symmetry_weights[t]*logits
#probs = torch.nn.functional.softmax((total_logits+bias_t) / temperature, dim=-1)
#total_logits_123 = 0.33*logits_1+0.33*logits_2+0.33*logits_3
#output should be ***ooxx
python run.py \
--seed 111 \
--pdb_path "./inputs/1BC8.pdb" \
--out_folder "./outputs/symmetry" \
--symmetry_residues "C1,C2,C3|C4,C5|C6,C7" \
--symmetry_weights "0.33,0.33,0.33|0.5,0.5|0.5,0.5"
14 --homo_oligomer
Design homooligomer sequences. This automatically sets --symmetry_residues and --symmetry_weights assuming equal weighting from all chains.
python run.py \
--model_type "ligand_mpnn" \
--seed 111 \
--pdb_path "./inputs/4GYT.pdb" \
--out_folder "./outputs/homooligomer" \
--homo_oligomer 1 \
--number_of_batches 2
15 --file_ending
Outputs will have a specified ending; e.g. 1BC8_xyz.fa instead of 1BC8.fa
python run.py \
--seed 111 \
--pdb_path "./inputs/1BC8.pdb" \
--out_folder "./outputs/file_ending" \
--file_ending "_xyz"
16 --zero_indexed
Zero indexed names in /backbones/1BC8_0.pdb, 1BC8_1.pdb, 1BC8_2.pdb etc
python run.py \
--seed 111 \
--pdb_path "./inputs/1BC8.pdb" \
--out_folder "./outputs/zero_indexed" \
--zero_indexed 1 \
