DeepPurpose
A Deep Learning Toolkit for DTI, Drug Property, PPI, DDI, Protein Function Prediction (Bioinformatics)
Install / Use
/learn @kexinhuang12345/DeepPurposeREADME
This repository hosts DeepPurpose, a Deep Learning Based Molecular Modeling and Prediction Toolkit on Drug-Target Interaction Prediction, Compound Property Prediction, Protein-Protein Interaction Prediction, and Protein Function prediction (using PyTorch). We focus on DTI and its applications in Drug Repurposing and Virtual Screening, but support various other molecular encoding tasks. It allows very easy usage (several lines of codes only) to facilitate deep learning for life science research.
News!
- [05/21]
0.1.2Support 5 new graph neural network based models for compound encoding (DGL_GCN, DGL_NeuralFP, DGL_GIN_AttrMasking, DGL_GIN_ContextPred, DGL_AttentiveFP), implemented using DGL Life Science! An example is provided here! - [12/20] DeepPurpose is now supported by TDC data loader, which contains a large collection of ML for therapeutics datasets, including many drug property, DTI datasets. Here is a tutorial!
- [12/20] DeepPurpose can now be installed via
pip! - [11/20] DeepPurpose is published in Bioinformatics!
- [11/20] Added 5 more pretrained models on BindingDB IC50 Units (around 1Million data points).
- [10/20] Google Colab Installation Instructions are provided here. Thanks to @hima111997 !
- [10/20] Using DeepPurpose, we made a humans-in-the-loop molecular design web UI interface, check it out! [Website, paper]
- [09/20] DeepPurpose has now supported three more tasks: DDI, PPI and Protein Function Prediction! You can simply call
from DeepPurpose import DDI/PPI/ProteinPredto use, checkout examples below! - [07/20] A simple web UI for DTI prediction can be created under 10 lines using Gradio! A demo is provided here.
- [07/20] A blog is posted on the Towards Data Science Medium column, check this out!
- [07/20] Two tutorials are online to go through DeepPurpose's framework to do drug-target interaction prediction and drug property prediction (DTI, Drug Property).
- [05/20] Support drug property prediction for screening data that does not have target proteins such as bacteria! An example using RDKit2D with DNN for training and repurposing for pseudomonas aeruginosa (MIT AI Cures's open task) is provided as a demo.
- [05/20] Now supports hyperparameter tuning via Bayesian Optimization through the Ax platform! A demo is provided in here.
Features
-
15+ powerful encodings for drugs and proteins, ranging from deep neural network on classic cheminformatics fingerprints, CNN, transformers to message passing graph neural network, with 50+ combined models! Most of the combinations of the encodings are not yet in existing works. All of these under 10 lines but with lots of flexibility! Switching encoding is as simple as changing the encoding names!
-
Realistic and user-friendly design:
- support DTI, DDI, PPI, molecular property prediction, protein function predictions!
- automatic identification to do drug target binding affinity (regression) or drug target interaction prediction (binary) task.
- support cold target, cold drug settings for robust model evaluations and support single-target high throughput sequencing assay data setup.
- many dataset loading/downloading/unzipping scripts to ease the tedious preprocessing, including antiviral, COVID19 targets, BindingDB, DAVIS, KIBA, ...
- many pretrained checkpoints.
- easy monitoring of training process with detailed training metrics output such as test set figures (AUCs) and tables, also support early stopping.
- detailed output records such as rank list for repurposing result.
- various evaluation metrics: ROC-AUC, PR-AUC, F1 for binary task, MSE, R-squared, Concordance Index for regression task.
- label unit conversion for skewed label distribution such as Kd.
- time reference for computational expensive encoding.
- PyTorch based, support CPU, GPU, Multi-GPUs.
NOTE: We are actively looking for constructive advices/user feedbacks/experiences on using DeepPurpose! Please open an issue or contact us.
Cite Us
If you found this package useful, please cite our paper:
@article{huang2020deeppurpose,
title={DeepPurpose: A Deep Learning Library for Drug-Target Interaction Prediction},
author={Huang, Kexin and Fu, Tianfan and Glass, Lucas M and Zitnik, Marinka and Xiao, Cao and Sun, Jimeng},
journal={Bioinformatics},
year={2020}
}
Installation
Try it on Binder! Binder is a cloud Jupyter Notebook interface that will install our environment dependency for you.
Video tutorial to install Binder.
We recommend to install it locally since Binder needs to be refreshed every time launching. To install locally, we recommend to install from pip:
pip
conda create -n DeepPurpose python=3.6
conda activate DeepPurpose
conda install -c conda-forge notebook
pip install git+https://github.com/bp-kelley/descriptastorus
pip install DeepPurpose
Build from Source
First time:
git clone https://github.com/kexinhuang12345/DeepPurpose.git ## Download code repository
cd DeepPurpose ## Change directory to DeepPurpose
conda env create -f environment.yml ## Build virtual environment with all packages installed using conda
conda activate DeepPurpose ## Activate conda environment (use "source activate DeepPurpose" for anaconda 4.4 or earlier)
jupyter notebook ## open the jupyter notebook with the conda env
## run our code, e.g. click a file in the DEMO folder
... ...
conda deactivate ## when done, exit conda environment
In the future:
cd DeepPurpose ## Change directory to DeepPurpose
conda activate DeepPurpose ## Activate conda environment
jupyter notebook ## open the jupyter notebook with the conda env
## run our code, e.g. click a file in the DEMO folder
... ...
conda deactivate ## when done, exit conda environment
Video tutorial to install locally from source.
Example
Case Study 1(a): A Framework for Drug Target Interaction Prediction, with less than 10 lines of codes.
In addition to the DTI prediction, we also provide repurpose and virtual screening functions to rapidly generation predictions.
<details> <summary>Click here for the code!</summary>from DeepPurpose import DTI as models
from DeepPurpose.utils import *
from DeepPurpose.dataset import *
SAVE_PATH='./saved_path'
import os
if not os.path.exists(SAVE_PATH):
os.makedirs(SAVE_PATH)
# Load Data, an array of SMILES for drug, an array of Amino Acid Sequence for Target and an array of binding values/0-1 label.
# e.g. ['Cc1ccc(CNS(=O)(=O)c2ccc(s2)S(N)(=O)=O)cc1', ...], ['MSHHWGYGKHNGPEHWHKDFPIAKGERQSPVDIDTH...', ...], [0.46, 0.49, ...]
# In this example, BindingDB with Kd binding score is used.
X_drug, X_target, y = process_BindingDB(download_BindingDB(SAVE_PATH),
y = 'Kd',
binary = False,
convert_to_log = True)
# Type in the encoding names for drug/protein.
drug_encoding, target_encoding = 'CNN', 'Transformer'
# Data processing, here we select cold protein split setup.
train, val, test = data_process(X_drug, X_target, y,
drug_encoding, target_encoding,
split_method='cold_protein',
frac=[0.7,0.1,0.2])
# Generate new model using default parameters; also allow model tuning via input parameters.
config = generate_config(drug_encoding, target_encoding, transformer_n_layer_target = 8)
net = models.model_initialize(**config)
# Train the new model.
# Detailed output including a tidy table storing validation loss, metrics, AUC curves figures and etc. are stored in the ./result folder.
net.train(train, val, test)
# or simply load pretrained model from a model directory path or reproduced model name such as DeepDTA
net = models.model_pretrained(MODEL_PATH_DIR or MODEL_NAME)
# Repurpose using the trained model or pre-trained model
# In this exa
