PocketOptimizer - A Python Library for Protein-Ligand Binding Design

PocketOptimizer is a framework that offers experimentation with different scoring functions in protein-ligand binding design.

API -- Python 3 and workflow interface
Minimization -- Minimization through OpenMM <https://openmm.org/>_
Force Fields -- Amber ff14SB <https://pubs.acs.org/doi/10.1021/acs.jctc.5b00255>_ and CHARMM36 <https://pubmed.ncbi.nlm.nih.gov/23832629/>_
Scoring Functions -- Binding interaction scoring with Smina <https://github.com/mwojcikowski/smina>_ (empirical) or FFEvaluate <https://software.acellera.com/htmd/tutorials/FFEvaluate.html>_ (physics-based)
Deterministic -- Linear programming is applied to find the Global Minimum Energy Conformation (GMEC) in a given set

Installation via the Repository

If you want to use PocketOptimizer you can clone the GitHub repository and set up the conda environment:

.. code-block:: bash

git clone https://github.com/Hoecker-Lab/pocketoptimizer

cd /YOUR_PATH/pocketoptimizer

The repository contains an environment.yml file that holds information about all dependencies and versions required for PocketOptimizer. By typing:

.. code-block:: bash

conda env create -f environment.yml

Conda creates a separate PocketOptimizer environment.

Running PocketOptimizer

Since PocketOptimizer's rework, the framework is now accessed solely through Python 3. This means that it can be incorporated in your regular Python scripts or in an interactive fashion through Jupyter Notebooks. The former is useful for automatized pipelines while the latter is helpful for going through every step separately. Jupyter is already part of the pocketoptimizer environment and does not need be installed. Jupyter can be opened in two different ways:

jupyter-lab

jupyter-notebook

Before being able to use PocketOptimizer in Jupyter you need to install the kernel from inside the conda environment:

.. code-block:: bash

conda activate pocketoptimizer

python3 -m ipykernel install --user --name pocketoptimizer

Then you can select it in the upper right corner of the Jupyter-Notebook.

General Project Layout

PocketOptimizer works for each protein-ligand design project in a separate project directory.

The general layout looks like this:

project
├── designs
├── energies
├── ligand
└── scaffold

The first step involves creating a project directory (you can obviously choose the name) and a scaffold and ligand sub-directory. The other directories are automatically created during the design process.

Now place the following files inside the ligand and scaffold directory:

project
├── ligand
│   └──  YOUR_LIGAND.mol2
│
└── scaffold
    └── YOUR_PROTEIN.pdb

YOUR_LIGAND.mol2 = starting ligand pose placed inside the binding pocket
YOUR_PROTEIN.pdb = protein structure used as scaffold

Ligand Preparation

1.1 How to get your small molecule

There are multiple ways to obtain your molecule of choice. If you want to make a design for a molecule different from a ligand bound in your crystal structure, you can do a search on RCSB <http://www.rcsb.org/pdb/ligand/chemAdvSearch.do>_ for different kinds of ligands. This allows you to download a molecule in the sdf format.

If you already have a protein crystal structure with the desired ligand, you can also extract the ligand from the .pdb file using for example PyMol <https://pymol.org/2/>_. But beware that the ligand is missing all hydrogen atoms.

Note: PocketOptimizer works with several input formats (mol2, sdf) that will be converted internally.

1.2 Placing the ligand inside the binding pocket

PocketOptimizer is based on semi-rational design principles which offers the flexibility to design the binding pocket following your ideas.

If you extracted your ligand from a protein crystal structure, then this step is not of importance for you. Otherwise, the easiest way to get the ligand inside the binding pocket is to superpose it on an existing ligand. The superposition is strictly dependent on your design thoughts and also requires some experimentation und multiple design runs.

The easiest way the superposition can be done is to use PyMol, which offers a Pair-Wise alignment tool to easily align elements the way you want to. The tool can be found in the PyMol toolbar at the top in Wizard as the name Pair Fit.

If you don't have initial information about a binding pose available, another way is to produce an initial pose using a docking program such as Autodock Vina <https://vina.scripps.edu/>_.

First Design Steps

As mentioned, PocketOptimizer needs to be initialized in your project directory. Therefore, inside every script or Jupyter notebook you use, you need to define the following lines:

.. code-block:: python

# Append the PocketOptimizer Code
import sys
sys.path.append('YOUR_POCKETOPTIMIZER_PATH')

# Import the pocketoptimizer module
import pocketoptimizer as po

# Initialize a new design pipeline
design = po.DesignPipeline(work_dir=project_dir,         # Path to working directory containing scaffold and ligand subdirectory
                           ph=7,                         # pH used for protein and ligand protonation
                           forcefield='amber_ff14SB',    # forcefield used for all energy computations (Use Amber as it is better tested!)
                           ncpus=8)                      # Number of CPUs for multiprocessing

While you are initializing you can define a pH, used for protonating the side chains of the protein and also the ligand molecule. Additionally, PocketOptimizer has two force fields implemented, the AMBER ff14SB and the CHARMM36 force field. These force fields contain parameters and energy functions to calculate the energy of the protein-ligand system. Besides you can define the number of CPUs used for all energy calculations.

2.1 Preparation/Minimization

2.1.1 Ligand Preparation ++++++++++++++++++++++++

The ligand also gets protonated and parameterized. However, the chemical space for small molecules can not be easily described by prebuild force field atom types, since the variety of small organic molecules far exceeds that of the 20 canonical amino acids, which is why ligands generally need to be parameterized separately. For AMBER force fields this can be done by using either GAFF or GAFF2 (General AMBER Force Field) <https://pubmed.ncbi.nlm.nih.gov/15116359/>, for CHARMM the tool is called CGenFF (Charmm GENeral Force Field) <https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2888302/>.

PocketOptimizer needs the following ligand inputs:

* Ligand in mol2/sdf format

Eventually:

* Parameters in frcmod or prm/rtf format

Experienced users can obtain these by using tools like ANTECHAMBER <http://ambermd.org/antechamber/ac.html>_ and PARMCHK <http://ambermd.org/tutorials/basic/tutorial5/>_ for the AMBER force field or CGenFF for the CHARMM force field.

PocketOptimizer offers a python interface utilizing these tools to parameterize your small molecule:

.. code-block:: python

#  Only necessary if you don't have ligand parameters.
design.parameterize_ligand(
input_ligand='ligand/YOUR_LIGAND.mol2', # Input ligand structure file could be .mol2/.sdf
addHs=True                              # Whether to add hydrogen atoms to the input structure
)

This creates a ligand.mol2 structure file and additionally either a ligand.frcmod or ligand.prm/ligand.rtf parameter files in the ligand directory under FORCEFIELD/params. Before you proceed, take a look at those files if the structure is correct protonated and suits your needs.

ligand ├── ligand_structure.mol2 └── FORCEFIELD ├── ligand.mol2 └── params └── ligand.mol2/ligand.frcmod or ligand.prm/ligand.rtf

Hint: Use relative paths for the scaffold and ligand structures, as you are inside the project directory during the entire design process.

2.1.2 Protein Preparation +++++++++++++++++++++++++

Before the design process can start, the protein scaffold needs to be cleaned of ions, waters, small molecules (like natural ligands) and unnecessary protein chains. Furthermore, the protein scaffold needs to be protonated to a certain pH that was defined when initializing the design pipeline and it needs to be minimised. PocketOpimizer has built in functionalities for this, utilizing the HTMD <https://pubs.acs.org/doi/abs/10.1021/acs.jctc.6b00049>_ and OpenMM <https://openmm.org/>_ distribution. After you placed your protein of choice inside the PROJECT_NAME/scaffold/ directory, type the following:

.. code-block:: python

design.prepare_protein(
    protein_structure='scaffold/YOUR_PROTEIN.pdb',  # Input PDB
    keep_chains=['A', 'B'],  # Specific protein chain to keep
    backbone_restraint=True, # Restrains the backbone during the minimization
    discard_mols=[]          # Special molecules to exclude. Per default everything, but peptides have to be defined manually
    )

Note: Please check your protein for any alternative residue numbering, such as: 110A and remove these

The following files are created after this step:

scaffold
└── FORCEFIELD
    ├── protein_preparation
    │   ├── prepared_scaffold.pdb
    │   └── scaffold_report.xlsx
    ├── protein_params
    └── scaffold.pdb

In the scaffold folder a FORCEFIELD sub-folder is created named after the respective force field that was

Pocketoptimizer

Install / Use

README

PocketOptimizer - A Python Library for Protein-Ligand Binding Design

Installation via the Repository

Running PocketOptimizer

General Project Layout