Pyscreener
pythonic interface to virtual screening software
Install / Use
/learn @coleygroup/PyscreenerREADME
pyscreener
A pythonic interface to high-throughput virtual screening software
Overview
This repository contains the source of pyscreener, both a library and software for conducting HTVS via python calls
Table of Contents
Installation
Docker
If docker is not installed already for your system then it can be installed from the official docker website.
The provided Dockerfile can be used to create pyscreener instances containing the required docking software and python dependencies / code. Any of the four vina docking softwares - vina, qvina2, smina, and psovina - can be specified for installation to the docker image.
All python dependencies and the pyscreener library are installed to a conda environment named pyscreener which must be activated once the docker image starts.
The below commands can be run in the directory containing the Dockerfile and environment.yml files to build the desired image:
docker build -t pyscreener:base --target base .: Creates a docker image containing all python dependencies and pyscreener library but no docking softwaredocker build -t pyscreener:vina --target vina .: Creates an image frompyscreener:basewithvinainstalleddocker build -t pyscreener:qvina --target qvina .: Creates an image frompyscreener:basewithqvinainstalleddocker build -t pyscreener:smina --target smina .: Creates an image frompyscreener:basewithsminainstalleddocker build -t pyscreener:psovina --target psovina .: Creates an image frompyscreener:basewithpsovinainstalled
As DOCK6 software requires a license, it is not possible to include its installation within the associated docker image.
A compiled form of sphgen_cpp and the binary required for installation of chimera are both available within the dock6_utils directory of the associated dock6 image:
docker build -t pyscreener:dock6 --target base-dock6 .: Creates an image frompyscreener:basecontaining utility software needed forDOCK6to run once installed
Notes :
- Within the docker container, the environment
basewill be activated by default. This contains all the required python dependencies so there is no need to manually activate an environment once inside the container - If installing using docker, then the below installation stages are not required for the corresponding vina-type software. However, the DOCK6 directions must still be followed.
General requirements
- python >= 3.8
numpy,openbabel,openmm,pdbfixer,ray,rdkit,scikit-learn,scipy, andtqdm- all corresponding software downloaded and located on your PATH or under the path of a specific environment variable (see external software for more details.)
Setup
- (if necessary) install conda
conda env create -f environment.ymlconda activate pyscreenerpip install pyscreener(or if installing from source,pip install .)- follow the corresponding directions below for the intended software
Before running pyscreener, be sure to first activate the environment: conda activate pyscreener (or whatever you've named your environment)
external software
-
vina-type software
-
install ADFR Suite and add
prepare_receptorto your PATH. If this step was successful, the commandwhich prepare_receptorshould outputpath/to/prepare_receptor. This can be done via either:-
adding the entire
bindirectory to your path (you should see a command at the end of the installation process) or -
adding only
prepare_receptorin thebindirectory to your PATH as detailed below
-
-
install any of the following docking software: vina 1.1.2 (note: pyscreener does not work with vina 1.2), qvina2, smina, psovina and ensure the desired software executable is in a folder that is located on your path
-
-
- obtain a license for DOCK6
- install DOCK6 from the download link and follow the installation directions
- after ensuring the installation was installed properly, specify the DOCK6 environment variable as the path of the DOCK6 parent directory as detailed below. This is the directory that was unzipped from the tarball and is usually named
dock6. It is the folder that contains thebin,install, etc. subdirectories.) - install sphgen_cpp. On linux systems, this can be done:
wget http://dock.compbio.ucsf.edu/Contributed_Code/code/sphgen_cpp.1.2.tar.gztar -xzvf sphgen_cpp.1.2.tar.gzcd sphgen_cpp.1.2make
- place the sphgen_cpp executable (it should be
sphgen_cpp) inside thebinsubdirectory of the DOCK6 parent directory. If you've configured the environment variable already, (on linux) you can run:mv sphgen_cpp $DOCK6/bin - install chimera and place the file on your PATH as detailed below
adding an executable to your PATH
To add an executable to your PATH, you have three options:
- create a symbolic link to the executable inside a directory that is already on your path:
ln -s FILE -t DIR. Typically,~/binor~/.local/binare good target directories (i.e.,DIR). To see what directories are currently on your path, typeecho $PATH. There will typically be a lot of directories on your path, and it is best to avoid creating files in any directory above your home directory ($HOMEon most *nix-based systems) - copy the software to a directory that is already on your path. Similar, though less preferred than the above:
cp FILE DIR - append the directory containing the file to your PATH:
export PATH=$PATH:DIR, whereDIRis the directory containing the file in question. As your PATH must be configured each time run pyscreener, this command should also be placed inside your~/.bashrcor~/.bash_profile(if using a bash shell) to avoid needing to run the command every time you log in. Note: if using a non-bash shell, the specific file will be different.
specifying an environment variable
To set the DOCK6 environment variable, run the following command: export DOCK6=path/to/dock6, where path/to/dock6 is the full path of the DOCK6 parent directory mentioned above. As this this environment variable must always be set before running pyscreener, the command should be placed inside your ~/.bashrc or ~/.bash_profile (if using a bash shell) to avoid needing to run the command every time you log in. Note: if using a non-bash shell, the specific file will be different.
Ray Setup
pyscreener uses ray as its parallel backend. If you plan to parallelize the software only across your local machine, don't need to do anything . However, if you wish to either (a.) limit the number of cores pyscreener will be run over or (b.) run it over a distributed setup (e.g., an HPC with many distinct nodes), you must manually start a ray cluster before running pyscreener.
Limiting the number of cores
To do this, simply type ray start --head --num-cpus N before starting pyscreener (where N is the total number of cores you wish to allow pyscreener to utilize). Not performing this step will give pyscreener access to all of the cores on your local machine, potentially slowing down other applications.
Distributing across many nodes
While the precise instructions for this will vary with HPC cluster architecture, the general idea is to establish a ray cluster between the nodes allocated to your job. We have provided a sample SLURM submission script (run_pyscreener_distributed_example.batch) to achieve this, but you may have to alter some commands depending on your system. For more information on this see here. To allow pyscreener to connect to your ray cluster, you must set the ip_head and redis_password environment variables appropriately, where ip_head is the address of the head of your ray cluster, i.e., IP:PORT where IP is the IP address of the head node and PORT is the port that is running ray.
pyscreener writes a lot of intermediate input and output files (due to the inherent specifications of the underlying docking software.) Given that the primary
