PandaDock
PandaDock: Physics based Molecular Docking with GNN Scoring
Install / Use
/learn @pritampanda15/PandaDockREADME
PandaDock - Molecular Docking with GNN Scoring
<p align="center"> <a href="https://github.com/pritampanda15/PandaDock"> <img src="https://github.com/pritampanda15/PandaDock/blob/v4.0/PandaDock.png" width="500" alt="PandaDock Logo"/> </a> </p> <p align="center"> <a href="https://pypi.org/project/pandadock/"> <img src="https://img.shields.io/pypi/v/pandadock.svg" alt="PyPI Version"> </a> <a href="https://github.com/pritampanda15/PandaDock/blob/main/LICENSE"> <img src="https://img.shields.io/github/license/pritampanda15/PandaDock" alt="License"> </a> <a href="https://github.com/pritampanda15/PandaDock/stargazers"> <img src="https://img.shields.io/github/stars/pritampanda15/PandaDock?style=social" alt="GitHub Stars"> </a> <a href="https://github.com/pritampanda15/PandaDock/issues"> <img src="https://img.shields.io/github/issues/pritampanda15/PandaDock" alt="GitHub Issues"> </a> <a href="https://github.com/pritampanda15/PandaDock/network/members"> <img src="https://img.shields.io/github/forks/pritampanda15/PandaDock?style=social" alt="GitHub Forks"> </a> <a href="https://pepy.tech/project/pandadock"> <img src="https://static.pepy.tech/badge/pandadock" alt="Downloads"> </a> </p> <p align="center"> <a href="https://www.python.org/downloads/"> <img src="https://img.shields.io/badge/python-3.8+-blue.svg" alt="Python 3.8+"> </a> <a href="https://opensource.org/licenses/MIT"> <img src="https://img.shields.io/badge/License-MIT-yellow.svg" alt="License: MIT"> </a> <a href="https://pandadock.readthedocs.io/"> <img src="https://readthedocs.org/projects/pandadock/badge/?version=latest" alt="Documentation Status"> </a> </p>
SE(3)-Equivariant GNN Scoring for Molecular Docking
Installation | Quick Start | Documentation | Benchmark | Citation
</div>Overview
PandaDock v4.0 features a novel SE(3)-equivariant Graph Neural Network (GNN) scoring function that achieves state-of-the-art correlation with experimental binding affinities (R=0.88 on PDBbind, R=0.82 on ULVSH, R=0.81 on BindingDB). The hybrid docking workflow combines traditional pose generation with GNN rescoring to deliver superior accuracy.
Key Features
- PandaDock-GNN: SE(3)-equivariant scoring achieving Pearson R = 0.88 on PDBbind
- Hybrid Docking: Combined pose generation + GNN rescoring (recommended workflow)
- Universal Rescorer: Rescore poses from ANY docking tool (Vina, Glide, GOLD, etc.)
- Vina-Style Scoring: AutoDock Vina empirical weights as default scoring
- Multi-Task Learning: Joint pKd/pEC50 regression + activity classification
- Heterogeneous Graphs: Separate protein/ligand node types with interaction edges
- Specialized Modes: Flexible, metal coordination, and tethered docking
Benchmark Performance
PDBbind v2020 Refined Set (5,316 complexes)
| Metric | Value | |--------|-------| | Pearson R | 0.88 | | Spearman R | 0.88 | | RMSE | 0.93 pK units | | MAE | 0.68 pK units | | Within 1.0 pK | 77.5% | | Within 1.5 pK | 90.5% |
ULVSH Dataset (942 compounds, 10 protein targets)
| Method | Type | Pearson R | N | |--------|------|-----------|---| | PandaDock-GNN (test) | ML Scoring | 0.82 | 95 | | PandaDock-GNN (full) | ML Scoring | 0.67 | 942 | | VM2 | ULVSH Baseline | 0.15 | 942 | | PM6 | ULVSH Baseline | 0.08 | 939 | | Hyde | ULVSH Baseline | 0.02 | 942 | | Gnina | ULVSH Baseline | 0.01 | 941 |
BindingDB Dataset (8,891 protein-ligand complexes)
| Training Configuration | Test Pearson R | Test RMSE | N (train) | |------------------------|----------------|-----------|-----------| | BindingDB Only | 0.81 | - | 7,113 | | BindingDB + ULVSH | 0.79 | 0.96 | 7,866 | | BindingDB + ULVSH + PDBbind | 0.49 | 1.37 | 12,118 |
Note: Combined training with PDBbind shows reduced performance due to affinity scale differences (pKd vs pEC50). For best results, train on datasets with compatible affinity measurements.
Key Results:
- PandaDock-GNN achieves R = 0.88 on PDBbind (5,316 complexes)
- R = 0.81 on BindingDB test set (889 complexes)
- 5.5x improvement over the best baseline (VM2) on ULVSH
- Activity classification AUC = 0.94 on ULVSH test set
Installation
Prerequisites
- Python 3.8 or higher
- Conda package manager (recommended for RDKit)
Basic Installation
# Clone repository
git clone https://github.com/pritampanda15/PandaDock.git
cd PandaDock
# Create conda environment with RDKit
conda create -n pandadock python=3.10
conda activate pandadock
conda install -c conda-forge rdkit
# Install PandaDock
pip install -e .
GNN Installation (Recommended)
# Install PyTorch and PyTorch Geometric for GNN support
pip install -e ".[gnn]"
# Or manually:
pip install torch torch-geometric torch-scatter torch-sparse
For detailed installation instructions, see INSTALL.md.
Quick Start
Download Pre-trained Model (Recommended)
Get started immediately with the pre-trained model:
# Download the pre-trained model (~82 MB)
pandadock gnn download-model
# Model is saved to models/pandadock_gnn_v4.pt
Hybrid Docking (Recommended)
The hybrid workflow combines traditional pose generation with GNN rescoring for best accuracy:
# Using pre-trained model
pandadock hybrid -r protein.pdb -l ligand.sdf \
--center 10 20 30 --box 20 20 20 \
-m models/pandadock_gnn_v4.pt \
-o results/
# Or train your own model first
pandadock gnn train -d ULVSH/ -o models/ --epochs 100
pandadock hybrid -r protein.pdb -l ligand.sdf \
--center 10 20 30 --box 20 20 20 \
-m models/best_model.pt \
-o results/
Traditional Docking
# Simple docking with Vina-style scoring
pandadock dock -r protein.pdb -l ligand.sdf \
--center 10 20 30 --box 20 20 20 \
-o results/
GNN Prediction Only
# Predict binding affinity for a pre-docked complex
pandadock gnn predict -m model.pt -p protein.mol2 -l ligand.mol2
Universal Rescorer (NEW)
Rescore poses from ANY docking tool using the GNN:
# Rescore poses from AutoDock Vina
pandadock gnn rescore -m model.pt -r receptor.pdb -p vina_out.sdf -o ranked.csv
# Rescore poses from pandadock-flex
pandadock gnn rescore -m model.pt -r protein.pdb -p flex_poses.sdf --output-sdf ranked.sdf
# Rescore poses from Glide, GOLD, or any other tool
pandadock gnn rescore -m model.pt -r protein.pdb -p docked_poses.sdf
Compare Against Baselines
# Benchmark GNN against all baseline methods
pandadock gnn compare -m model.pt -d ULVSH/ -o comparison/
Commands
Core Commands
| Command | Description |
|---------|-------------|
| pandadock dock | Traditional docking with Vina-style scoring |
| pandadock hybrid | Hybrid docking with GNN rescoring (recommended) |
GNN Commands
| Command | Description |
|---------|-------------|
| pandadock gnn download-model | Download pre-trained model (~82 MB) |
| pandadock gnn train | Train GNN model on dataset (ULVSH, PDBbind, or combined) |
| pandadock gnn predict | Predict binding affinity for a single complex |
| pandadock gnn rescore | Universal rescorer for poses from ANY docking tool |
| pandadock gnn benchmark | Benchmark model performance on test set |
| pandadock gnn compare | Compare against baseline scoring methods |
Specialized Docking
| Command | Description |
|---------|-------------|
| pandadock-flex | Flexible/induced-fit docking |
| pandadock-metal | Metal coordination docking |
| pandadock-tethered | Constrained docking near reference |
Utility Tools
| Command | Description |
|---------|-------------|
| pandadock-prepare | Prepare ligands (add H, generate 3D) |
| pandadock-gridbox | Generate grid box configurations |
| pandadock-report | Generate analysis reports |
Universal GNN Rescorer
The pandadock gnn rescore command allows you to rescore docked poses from any docking software using the SE(3)-equivariant GNN:
Supported Input
- AutoDock Vina output (SDF/PDBQT converted to SDF)
- Glide poses (SDF)
- GOLD poses (SDF)
- pandadock-flex flexible docking poses
- pandadock-metal metal coordination poses
- pandadock-tethered constrained poses
- Any multi-conformer SDF file
Usage
pandadock gnn rescore -m model.pt -r receptor.pdb -p poses.sdf [OPTIONS]
Options:
-m, --model PATH Trained GNN model checkpoint (required)
-r, --receptor PATH Receptor PDB or MOL2 file (required)
-p, --poses PATH Multi-conformer SDF with poses (required)
-o, --output PATH Output CSV with ranked poses (default: rescored_poses.csv)
--output-sdf PATH Output SDF with GNN scores as properties
--site-radius FLOAT Binding site extraction radius (default: 10 A)
Example Workflow
# Step 1: Run docking with your preferred tool
vina --receptor protein.pdbqt --ligand ligand.pdbqt --out poses.sdf
# Step 2: Rescore with PandaDock-GNN
pandadock gnn rescore -m model.pt -r protein.pdb -p poses.sdf \
-o ranked.csv --output-sdf ranked.sdf
# Output CSV columns:
# pose_name, pose_index, gnn_pKd, gnn_energy, activity_prob, predicted_active, gnn_rank
Output SDF Properties
When using --output-sdf, each molecule gets these properties:
GNN_pKd- Predicted pKd/pKi valueGNN_Energy- Predicted binding energy (kcal/mol)GNN_Activity- Activity probability (0-1)GNN_Rank- Rank based on GNN score (1 = best)
GNN Architecture
PandaDock-GNN uses an SE(3)-equivariant heterogeneous graph neural network:
Input: Protein-Ligand Complex
|
+
