PLACER

<ins>P</ins>rotein-<ins>L</ins>igand <ins>A</ins>tomistic <ins>C</ins>onformational <ins>E</ins>nsemble <ins>R</ins>esolver

(formerly known as ChemNet)

PLACER is a graph neural network that operates entirely at the atomic level; the nodes of the graph are the atoms in the system. PLACER was trained to recapitulate the correct atom positions from partially corrupted input structures from the Cambridge Structural Database and the Protein Data Bank. PLACER accurately generates structures of diverse organic small molecules given knowledge of their atom composition and bonding, and given a description of the larger protein context, can accurately build up structures of small molecules and protein side chains; used in this way PLACER has competitive performance on protein-small molecule docking given approximate knowledge of the binding site. PLACER is a rapid and stochastic denoising network, which enables generation of ensembles of solutions to model conformational heterogeneity.

Reference: https://www.biorxiv.org/content/10.1101/2024.09.25.614868v1

Installation
Usage
Examples
Python API usage
Questions & Troubleshooting
Disclaimer

Installation

Clone the repository git clone https://github.com/baker-laboratory/PLACER.git

The repository contains the model weights and the code is ready to run after an appropriate Python environment is set up.

Requirements

cuda-toolkit >= 12.1 pytorch = 2.3.*, build=cuda12 dgl = 2.4.0 opt_einsum = 3.4.0 openbabel = 3.1.1 networkx >= 3.2 numpy >= 1.2.6 pandas = 2.2.3 e3nn = 0.5.4

And for convenience: matplotlib, ipython, jupyterlab

Conda environment

A minimal conda environment for running PLACER: Create a new environment from envs/placer_env.yml:

conda env create -f envs/placer_env.yml

conda activate placer_env

For Mac users

A specific conda environment is necessary for running PLACER on the gpu of Apple Silicon using mps. It is based on SE3Transformer adapted for mps by YaoYingYing Create a new environment from envs/placer_env_mac.yml:

conda env create -f envs/placer_env_mac.yml

conda activate placer

Usage

PLACER is available as a commandline script, and as a Python module. For Python API usage, see below.

To run PLACER analysis from the commandline, use: python run_PLACER.py ...

Available arguments:

  -h, --help            show this help message and exit
  -i IDIR, --idir IDIR  input folder with PDB/mmCIF files (default: None)
  -f IFILE, --ifile IFILE
                        file with a list of input PDB/mmCIF files or a single PDB/mmCIF file. Only mmCIF files from RCSB are correctly parsed. (default: None)
  -o ODIR, --odir ODIR  output folder to save models and CSV files. Default is current run directory. (default: ./)
  -n NSAMPLES, --nsamples NSAMPLES
                        number of samples to generate. 50-100 is a good number in most cases. (default: 10)
  --ocsv OCSV           output .csv file to save scores. By default the CSV name is inferred from the input file name, with --suffix input added. (default: None)
  --suffix SUFFIX       suffix added to output PDB file (default: None)
  --cautious            Cautious mode. If output CSV exists, then it will not run that prediction again. (default: False)
  --exclude_common_ligands
                        All common solvents and crystallography additivies will be excluded from the prediction.
                        List of residues was obtained from AlphaFold3 supplementary data (DOI: 10.1038/s41586-024-07487-w). Useful when predicting directly any crystal structures.
                        (default: False)
  --predict_multi       All allowed ligands in input will be predicted and scored. fixed_ligand and predict_ligand inputs are respected. (default: False)
  --fixed_ligand FIXED_LIGAND [FIXED_LIGAND ...]
                        Ligand <name3> or <name3-resno> or <chain-name3-resno> that will remain fixed. (default: None)
  --predict_ligand PREDICT_LIGAND [PREDICT_LIGAND ...]
                        Ligand <name3> or <name3-resno> or <chain-name3-resno> that will be predicted. All other ligands will be fixed. (default: None)
  --target_res TARGET_RES
                        Protein residue <chain-resno> or <chain-name3-resno> that will be used as crop center. Required when input has no ligands. (default: None)
  --fixed_ligand_noise FIXED_LIGAND_NOISE
                        Noise added to fixed ligand coordinates. Default is the same as backbone atom `sigma_bb` in the model params. (default: None)
  --weights WEIGHTS     Weights file (pytorch .pt file). (default: weights/PLACER_model_1.pt)
  --rerank {prmsd,plddt,plddt_pde}
                        Output CSV and PDB models files are ranked from best to worst based on one of the input metrics: prmsd, plddt, plddt_pde.
                        Prmsd is sorted in ascending order; plddt and plddt_pde in descending order.
                        The model numbers that are printed to screen while the script runs no longer apply. (default: None)
  --bonds BONDS [BONDS ...]
                        put a bond between two atoms, e.g. "A-42-ALA-CB:B-173-JRP-CL:<bondlen>", as space-separated list (default: None)
  --mutate MUTATE [MUTATE ...]
                        mutate certain positions, e.g. "5A:TRP" or "5A:TRP 6A:GLY" (default: None)
  --crop_centers CROP_CENTERS [CROP_CENTERS ...]
                        Atom names that will be used as CROP centers. This centers the crop to a particular part of the pocket, but the ligands are still corrupted based on their input coordinates.
                        Used for refining where the cropped sphere is. This DOES NOT affect which atoms/ligands are selected for prediction. Use --predict_ligand ... for that.
                        One atom will be picked randomly from the provided set.
                        XYZ coordinate input available in the API. Example: "B-200-HEM-FE B-200-HEM-O1" (default: None)
  --corruption_centers CORRUPTION_CENTERS [CORRUPTION_CENTERS ...]
                        Atom names that will be used as corruption centers. Allows sampling the ligand around in the whole protein.
                        One will be picked randomly from the provided set. Must provide at least as many centers as there are ligands in the input.
                        XYZ coordinate input available in the API. Example: "B-200-HEM-FE B-200-HEM-O1" (default: None)
  --residue_json RESIDUE_JSON
                        JSON file that specifies any custom residues used in the PDB, or used with --mutate. These are added to the internal CCD library.
                        JSON format: {name3: {'sdf': <contents of SDF file as string>,
                                              'atom_id': [atom names],
                                              'leaving': [True/False for whether this atom is deleted when part of polymer],
                                              'pdbx_align': [int,...]}} (default: None)
  --ligand_file LIGAND_FILE [LIGAND_FILE ...]
                        SDF or MOL2 file of the ligand(s). (Input format: XXX:ligand1.sdf YYY:ligand2.mol2) ZZZ:CCD
                        Used for refining the atom typing and connectivity in the ligand structures. Coordinates are still parsed form the input PDB/mmCIF.
                        If ligand exists in CCD then ZZZ:CCD is a special input that enables reading the ligand in from an internal CCD ligands database. (default: None)
  --ignore_ligand_hydrogens
                        Affects --ligand_file. Ignores hydrogen atoms that are defined in the PDB and SDF/MOL2 files, and will not throw errors if the protonation states are different.
                        Hydrogen atoms are not predicted with PLACER anyway. (default: False)
  --use_sm              make predictions with the small molecule (holo - turned on by default) (default: True)
  --no-use_sm           make predictions w/o the small molecule (apo) (default: True)

Examples

Some example commands from examples/commandline_examples.sh:

# predicting the binding of an inhibitor in a P450 pocket, while keeping heme fixed (heme is fixed automatically if it's not in --predict_ligand input)
python ../run_PLACER.py --ifile inputs/4dtz.cif --odir example_out_CLI --rerank prmsd --suffix D-LDP-501 -n 10 --predict_ligand D-LDP-501

# predicting the binding of an inhibitor and heme in a P450 pocket, docking and scoring two ligands simultaneously
python ../run_PLACER.py --ifile inputs/4dtz.cif --odir example_out_CLI --rerank prmsd --suffix LDP-HEM -n 10 --predict_ligand D-LDP-501 C-HEM-500 --predict_multi

# predicting heme in denovo protein
python ../run_PLACER.py --ifile inputs/dnHEM1.pdb --odir example_out_CLI --rerank prmsd -n 10 --ligand_file HEM:ligands/HEM.mol2

# predicting sidechains in apo denovo protein, defining crop center to a residue
python ../run_PLACER.py --ifile inputs/dnHEM1_apo.pdb --odir example_out_CLI --suffix A149 -n 10 --target_res A-149

# Mutating a residue to a non-canonical, loading that non-canonical into residue database from a JSON file. Existing ligand is omitted from the prediction
python ../run_PLACER.py --ifile inputs/denovo_SER_hydrolase.pdb --odir example_out_CLI --suffix 75I -n 10 --mutat

PLACER

Install / Use

README

PLACER

<ins>P</ins>rotein-<ins>L</ins>igand <ins>A</ins>tomistic <ins>C</ins>onformational <ins>E</ins>nsemble <ins>R</ins>esolver

Table of contents

Installation

Requirements

Conda environment

For Mac users

Usage

Examples