SkillAgentSearch skills...

PandaDock

PandaDock: Physics based Molecular Docking with GNN Scoring

Install / Use

/learn @pritampanda15/PandaDock

README

PandaDock - Molecular Docking with GNN Scoring


<p align="center"> <a href="https://github.com/pritampanda15/PandaDock"> <img src="https://github.com/pritampanda15/PandaDock/blob/v4.0/PandaDock.png" width="500" alt="PandaDock Logo"/> </a> </p> <p align="center"> <a href="https://pypi.org/project/pandadock/"> <img src="https://img.shields.io/pypi/v/pandadock.svg" alt="PyPI Version"> </a> <a href="https://github.com/pritampanda15/PandaDock/blob/main/LICENSE"> <img src="https://img.shields.io/github/license/pritampanda15/PandaDock" alt="License"> </a> <a href="https://github.com/pritampanda15/PandaDock/stargazers"> <img src="https://img.shields.io/github/stars/pritampanda15/PandaDock?style=social" alt="GitHub Stars"> </a> <a href="https://github.com/pritampanda15/PandaDock/issues"> <img src="https://img.shields.io/github/issues/pritampanda15/PandaDock" alt="GitHub Issues"> </a> <a href="https://github.com/pritampanda15/PandaDock/network/members"> <img src="https://img.shields.io/github/forks/pritampanda15/PandaDock?style=social" alt="GitHub Forks"> </a> <a href="https://pepy.tech/project/pandadock"> <img src="https://static.pepy.tech/badge/pandadock" alt="Downloads"> </a> </p> <p align="center"> <a href="https://www.python.org/downloads/"> <img src="https://img.shields.io/badge/python-3.8+-blue.svg" alt="Python 3.8+"> </a> <a href="https://opensource.org/licenses/MIT"> <img src="https://img.shields.io/badge/License-MIT-yellow.svg" alt="License: MIT"> </a> <a href="https://pandadock.readthedocs.io/"> <img src="https://readthedocs.org/projects/pandadock/badge/?version=latest" alt="Documentation Status"> </a> </p>

SE(3)-Equivariant GNN Scoring for Molecular Docking

Installation | Quick Start | Documentation | Benchmark | Citation

</div>

Overview

PandaDock v4.0 features a novel SE(3)-equivariant Graph Neural Network (GNN) scoring function that achieves state-of-the-art correlation with experimental binding affinities (R=0.88 on PDBbind, R=0.82 on ULVSH, R=0.81 on BindingDB). The hybrid docking workflow combines traditional pose generation with GNN rescoring to deliver superior accuracy.

Key Features

  • PandaDock-GNN: SE(3)-equivariant scoring achieving Pearson R = 0.88 on PDBbind
  • Hybrid Docking: Combined pose generation + GNN rescoring (recommended workflow)
  • Universal Rescorer: Rescore poses from ANY docking tool (Vina, Glide, GOLD, etc.)
  • Vina-Style Scoring: AutoDock Vina empirical weights as default scoring
  • Multi-Task Learning: Joint pKd/pEC50 regression + activity classification
  • Heterogeneous Graphs: Separate protein/ligand node types with interaction edges
  • Specialized Modes: Flexible, metal coordination, and tethered docking

Benchmark Performance

PDBbind v2020 Refined Set (5,316 complexes)

| Metric | Value | |--------|-------| | Pearson R | 0.88 | | Spearman R | 0.88 | | RMSE | 0.93 pK units | | MAE | 0.68 pK units | | Within 1.0 pK | 77.5% | | Within 1.5 pK | 90.5% |

ULVSH Dataset (942 compounds, 10 protein targets)

| Method | Type | Pearson R | N | |--------|------|-----------|---| | PandaDock-GNN (test) | ML Scoring | 0.82 | 95 | | PandaDock-GNN (full) | ML Scoring | 0.67 | 942 | | VM2 | ULVSH Baseline | 0.15 | 942 | | PM6 | ULVSH Baseline | 0.08 | 939 | | Hyde | ULVSH Baseline | 0.02 | 942 | | Gnina | ULVSH Baseline | 0.01 | 941 |

BindingDB Dataset (8,891 protein-ligand complexes)

| Training Configuration | Test Pearson R | Test RMSE | N (train) | |------------------------|----------------|-----------|-----------| | BindingDB Only | 0.81 | - | 7,113 | | BindingDB + ULVSH | 0.79 | 0.96 | 7,866 | | BindingDB + ULVSH + PDBbind | 0.49 | 1.37 | 12,118 |

Note: Combined training with PDBbind shows reduced performance due to affinity scale differences (pKd vs pEC50). For best results, train on datasets with compatible affinity measurements.

Key Results:

  • PandaDock-GNN achieves R = 0.88 on PDBbind (5,316 complexes)
  • R = 0.81 on BindingDB test set (889 complexes)
  • 5.5x improvement over the best baseline (VM2) on ULVSH
  • Activity classification AUC = 0.94 on ULVSH test set

Installation

Prerequisites

  • Python 3.8 or higher
  • Conda package manager (recommended for RDKit)

Basic Installation

# Clone repository
git clone https://github.com/pritampanda15/PandaDock.git
cd PandaDock

# Create conda environment with RDKit
conda create -n pandadock python=3.10
conda activate pandadock
conda install -c conda-forge rdkit

# Install PandaDock
pip install -e .

GNN Installation (Recommended)

# Install PyTorch and PyTorch Geometric for GNN support
pip install -e ".[gnn]"

# Or manually:
pip install torch torch-geometric torch-scatter torch-sparse

For detailed installation instructions, see INSTALL.md.


Quick Start

Download Pre-trained Model (Recommended)

Get started immediately with the pre-trained model:

# Download the pre-trained model (~82 MB)
pandadock gnn download-model

# Model is saved to models/pandadock_gnn_v4.pt

Hybrid Docking (Recommended)

The hybrid workflow combines traditional pose generation with GNN rescoring for best accuracy:

# Using pre-trained model
pandadock hybrid -r protein.pdb -l ligand.sdf \
                 --center 10 20 30 --box 20 20 20 \
                 -m models/pandadock_gnn_v4.pt \
                 -o results/

# Or train your own model first
pandadock gnn train -d ULVSH/ -o models/ --epochs 100
pandadock hybrid -r protein.pdb -l ligand.sdf \
                 --center 10 20 30 --box 20 20 20 \
                 -m models/best_model.pt \
                 -o results/

Traditional Docking

# Simple docking with Vina-style scoring
pandadock dock -r protein.pdb -l ligand.sdf \
               --center 10 20 30 --box 20 20 20 \
               -o results/

GNN Prediction Only

# Predict binding affinity for a pre-docked complex
pandadock gnn predict -m model.pt -p protein.mol2 -l ligand.mol2

Universal Rescorer (NEW)

Rescore poses from ANY docking tool using the GNN:

# Rescore poses from AutoDock Vina
pandadock gnn rescore -m model.pt -r receptor.pdb -p vina_out.sdf -o ranked.csv

# Rescore poses from pandadock-flex
pandadock gnn rescore -m model.pt -r protein.pdb -p flex_poses.sdf --output-sdf ranked.sdf

# Rescore poses from Glide, GOLD, or any other tool
pandadock gnn rescore -m model.pt -r protein.pdb -p docked_poses.sdf

Compare Against Baselines

# Benchmark GNN against all baseline methods
pandadock gnn compare -m model.pt -d ULVSH/ -o comparison/

Commands

Core Commands

| Command | Description | |---------|-------------| | pandadock dock | Traditional docking with Vina-style scoring | | pandadock hybrid | Hybrid docking with GNN rescoring (recommended) |

GNN Commands

| Command | Description | |---------|-------------| | pandadock gnn download-model | Download pre-trained model (~82 MB) | | pandadock gnn train | Train GNN model on dataset (ULVSH, PDBbind, or combined) | | pandadock gnn predict | Predict binding affinity for a single complex | | pandadock gnn rescore | Universal rescorer for poses from ANY docking tool | | pandadock gnn benchmark | Benchmark model performance on test set | | pandadock gnn compare | Compare against baseline scoring methods |

Specialized Docking

| Command | Description | |---------|-------------| | pandadock-flex | Flexible/induced-fit docking | | pandadock-metal | Metal coordination docking | | pandadock-tethered | Constrained docking near reference |

Utility Tools

| Command | Description | |---------|-------------| | pandadock-prepare | Prepare ligands (add H, generate 3D) | | pandadock-gridbox | Generate grid box configurations | | pandadock-report | Generate analysis reports |


Universal GNN Rescorer

The pandadock gnn rescore command allows you to rescore docked poses from any docking software using the SE(3)-equivariant GNN:

Supported Input

  • AutoDock Vina output (SDF/PDBQT converted to SDF)
  • Glide poses (SDF)
  • GOLD poses (SDF)
  • pandadock-flex flexible docking poses
  • pandadock-metal metal coordination poses
  • pandadock-tethered constrained poses
  • Any multi-conformer SDF file

Usage

pandadock gnn rescore -m model.pt -r receptor.pdb -p poses.sdf [OPTIONS]

Options:
  -m, --model PATH      Trained GNN model checkpoint (required)
  -r, --receptor PATH   Receptor PDB or MOL2 file (required)
  -p, --poses PATH      Multi-conformer SDF with poses (required)
  -o, --output PATH     Output CSV with ranked poses (default: rescored_poses.csv)
  --output-sdf PATH     Output SDF with GNN scores as properties
  --site-radius FLOAT   Binding site extraction radius (default: 10 A)

Example Workflow

# Step 1: Run docking with your preferred tool
vina --receptor protein.pdbqt --ligand ligand.pdbqt --out poses.sdf

# Step 2: Rescore with PandaDock-GNN
pandadock gnn rescore -m model.pt -r protein.pdb -p poses.sdf \
    -o ranked.csv --output-sdf ranked.sdf

# Output CSV columns:
# pose_name, pose_index, gnn_pKd, gnn_energy, activity_prob, predicted_active, gnn_rank

Output SDF Properties

When using --output-sdf, each molecule gets these properties:

  • GNN_pKd - Predicted pKd/pKi value
  • GNN_Energy - Predicted binding energy (kcal/mol)
  • GNN_Activity - Activity probability (0-1)
  • GNN_Rank - Rank based on GNN score (1 = best)

GNN Architecture

PandaDock-GNN uses an SE(3)-equivariant heterogeneous graph neural network:

Input: Protein-Ligand Complex
  |
  +
View on GitHub
GitHub Stars95
CategoryEducation
Updated1mo ago
Forks18

Languages

Python

Security Score

100/100

Audited on Feb 25, 2026

No findings