pyscreener

A pythonic interface to high-throughput virtual screening software

Overview

This repository contains the source of pyscreener, both a library and software for conducting HTVS via python calls

Overview
Table of Contents
Requirements
Installation
Ray setup
Running pyscreener
Using pyscreener

Installation

Docker

If docker is not installed already for your system then it can be installed from the official docker website.

The provided Dockerfile can be used to create pyscreener instances containing the required docking software and python dependencies / code. Any of the four vina docking softwares - vina, qvina2, smina, and psovina - can be specified for installation to the docker image. All python dependencies and the pyscreener library are installed to a conda environment named pyscreener which must be activated once the docker image starts.

The below commands can be run in the directory containing the Dockerfile and environment.yml files to build the desired image:

docker build -t pyscreener:base --target base . : Creates a docker image containing all python dependencies and pyscreener library but no docking software
docker build -t pyscreener:vina --target vina . : Creates an image from pyscreener:base with vina installed
docker build -t pyscreener:qvina --target qvina . : Creates an image from pyscreener:base with qvina installed
docker build -t pyscreener:smina --target smina . : Creates an image from pyscreener:base with smina installed
docker build -t pyscreener:psovina --target psovina . : Creates an image from pyscreener:base with psovina installed

As DOCK6 software requires a license, it is not possible to include its installation within the associated docker image. A compiled form of sphgen_cpp and the binary required for installation of chimera are both available within the dock6_utils directory of the associated dock6 image:

docker build -t pyscreener:dock6 --target base-dock6 . : Creates an image from pyscreener:base containing utility software needed for DOCK6 to run once installed

Notes :

Within the docker container, the environment base will be activated by default. This contains all the required python dependencies so there is no need to manually activate an environment once inside the container
If installing using docker, then the below installation stages are not required for the corresponding vina-type software. However, the DOCK6 directions must still be followed.

General requirements

python >= 3.8
numpy, openbabel, openmm, pdbfixer, ray, rdkit, scikit-learn, scipy, and tqdm
all corresponding software downloaded and located on your PATH or under the path of a specific environment variable (see external software for more details.)

Setup

(if necessary) install conda
conda env create -f environment.yml
conda activate pyscreener
pip install pyscreener (or if installing from source, pip install .)
follow the corresponding directions below for the intended software

Before running pyscreener, be sure to first activate the environment: conda activate pyscreener (or whatever you've named your environment)

external software

vina-type software
1. install ADFR Suite and add prepare_receptor to your PATH. If this step was successful, the command which prepare_receptor should output path/to/prepare_receptor. This can be done via either:
  1. adding the entire bin directory to your path (you should see a command at the end of the installation process) or
  2. adding only prepare_receptor in the bin directory to your PATH as detailed below
2. install any of the following docking software: vina 1.1.2 (note: pyscreener does not work with vina 1.2), qvina2, smina, psovina and ensure the desired software executable is in a folder that is located on your path
DOCK6
1. obtain a license for DOCK6
2. install DOCK6 from the download link and follow the installation directions
3. after ensuring the installation was installed properly, specify the DOCK6 environment variable as the path of the DOCK6 parent directory as detailed below. This is the directory that was unzipped from the tarball and is usually named dock6. It is the folder that contains the bin, install, etc. subdirectories.)
4. install sphgen_cpp. On linux systems, this can be done:
  1. wget http://dock.compbio.ucsf.edu/Contributed_Code/code/sphgen_cpp.1.2.tar.gz
  2. tar -xzvf sphgen_cpp.1.2.tar.gz
  3. cd sphgen_cpp.1.2
  4. make
5. place the sphgen_cpp executable (it should be sphgen_cpp) inside the bin subdirectory of the DOCK6 parent directory. If you've configured the environment variable already, (on linux) you can run: mv sphgen_cpp $DOCK6/bin
6. install chimera and place the file on your PATH as detailed below

adding an executable to your PATH

To add an executable to your PATH, you have three options:

create a symbolic link to the executable inside a directory that is already on your path: ln -s FILE -t DIR. Typically, ~/bin or ~/.local/bin are good target directories (i.e., DIR). To see what directories are currently on your path, type echo $PATH. There will typically be a lot of directories on your path, and it is best to avoid creating files in any directory above your home directory ($HOME on most *nix-based systems)
copy the software to a directory that is already on your path. Similar, though less preferred than the above: cp FILE DIR
append the directory containing the file to your PATH: export PATH=$PATH:DIR, where DIR is the directory containing the file in question. As your PATH must be configured each time run pyscreener, this command should also be placed inside your ~/.bashrc or ~/.bash_profile (if using a bash shell) to avoid needing to run the command every time you log in. Note: if using a non-bash shell, the specific file will be different.

specifying an environment variable

To set the DOCK6 environment variable, run the following command: export DOCK6=path/to/dock6, where path/to/dock6 is the full path of the DOCK6 parent directory mentioned above. As this this environment variable must always be set before running pyscreener, the command should be placed inside your ~/.bashrc or ~/.bash_profile (if using a bash shell) to avoid needing to run the command every time you log in. Note: if using a non-bash shell, the specific file will be different.

Ray Setup

pyscreener uses ray as its parallel backend. If you plan to parallelize the software only across your local machine, don't need to do anything . However, if you wish to either (a.) limit the number of cores pyscreener will be run over or (b.) run it over a distributed setup (e.g., an HPC with many distinct nodes), you must manually start a ray cluster before running pyscreener.

Limiting the number of cores

To do this, simply type ray start --head --num-cpus N before starting pyscreener (where N is the total number of cores you wish to allow pyscreener to utilize). Not performing this step will give pyscreener access to all of the cores on your local machine, potentially slowing down other applications.

Distributing across many nodes

While the precise instructions for this will vary with HPC cluster architecture, the general idea is to establish a ray cluster between the nodes allocated to your job. We have provided a sample SLURM submission script (run_pyscreener_distributed_example.batch) to achieve this, but you may have to alter some commands depending on your system. For more information on this see here. To allow pyscreener to connect to your ray cluster, you must set the ip_head and redis_password environment variables appropriately, where ip_head is the address of the head of your ray cluster, i.e., IP:PORT where IP is the IP address of the head node and PORT is the port that is running ray.

pyscreener writes a lot of intermediate input and output files (due to the inherent specifications of the underlying docking software.) Given that the primary

Pyscreener

Install / Use

README