Graph Latent Dynamics Propagator (GLDP)

This repository contains the official implementation for the paper "Beyond Ensembles: Simulating All-Atom Protein Dynamics in a Learned Latent Space". We introduce the Graph Latent Dynamics Propagator (GLDP), a modular framework for simulating long-timescale protein dynamics.

The core idea is to use a pre-trained Graph Neural Network to encode high-dimensional, all-atom protein structures into a simplified latent space. Within this space, we can efficiently simulate the system's temporal evolution using one of several "propagator" methods. The resulting latent trajectory is then decoded back into all-atom structures.

Visual Overview

The GLDP framework follows a modular encoder-propagator-decoder pipeline:

Encoder: A pre-trained ChebNet GNN maps all-atom coordinates to a low-dimensional latent vector z(t).
Propagator: Advances the state in the latent space (z(t) -> z(t+1)) using one of three methods:
- Score-Guided Langevin Dynamics: A physics-informed stochastic simulation.
- Koopman Operator: A linear, data-driven model.
- Autoregressive Neural Network: A non-linear, expressive neural network.
Decoder: A pre-trained network maps the new latent vector z(t+1) back to all-atom coordinates.

Installation

A virtual environment is strongly recommended. The install_dependencies.sh script automates the setup of a venv and installs all required packages at their tested versions.

# Navigate to the scripts directory
cd gldp_repository/scripts

# Run the installation script
./install_dependencies.sh

# Activate the environment to run the pipeline
source venv/bin/activate

Usage

The run_pipeline.sh script automates the entire workflow for the Koopman and Neural Network propagators. It handles data preprocessing, model training, simulation, and analysis in a single command.

Command:

./run_pipeline.sh <path_to_pdb> <path_to_xtc> <method>

Arguments:

<path_to_pdb>: Path to your input PDB file.
<path_to_xtc>: Path to your input XTC trajectory file.
<method>: The propagation method to use. Choose from:
- koopman: For the Koopman operator.
- neural: For the autoregressive neural network.

Example:

# Assuming PDB/XTC files are in the current directory
./run_pipeline.sh 7jfl_C.pdb 7jfl_C_prod_R1_fit.xtc koopman

Each run creates a unique, timestamped output directory (e.g., run_20250915_210144/) containing all logs, models, and generated data. The final analysis, which identifies the best-generated trajectory, will be printed to pipeline_run.log inside that directory.

Langevin Dynamics Simulation (Manual Steps)

Running the score-guided Langevin dynamics simulation is a manual, multi-step process that requires running the scripts individually after the initial setup.

Prerequisite: You must first have a pooled_embedding.h5 file. You can generate this by running the automated pipeline (e.g., with the koopman method) and letting it complete Step 3 (the train_autoencoder.py script). The file will be in the latent_reps/ subdirectory of the run.

Step 1: Train the Score Model Use train_score_model.py to train the diffusion model on the latent embeddings. This model is necessary to calculate the score (effective force) for the Langevin simulation.

python train_score_model.py \
    --h5_file path/to/your/run_*/latent_reps/pooled_embedding.h5 \
    --output_model_path path/to/your/run_*/checkpoints/score_model.pth \
    --epochs 50000

Step 2: Run the Langevin Simulation Use run_langevin_simulation.py to perform the simulation in the latent space, using the trained score model from the previous step.

python run_langevin_simulation.py \
    --score_model_path path/to/your/run_*/checkpoints/score_model.pth \
    --h5_file path/to/your/run_*/latent_reps/pooled_embedding.h5 \
    --output_file path/to/your/run_*/latent_reps/langevin_rollout.h5 \
    --num_steps 100000

Step 3: Decode the Latent Trajectory Finally, use decode_trajectory.py and convert_h5_to_xtc.py to decode the generated latent trajectory (langevin_rollout.h5) back into an all-atom XTC file, which can then be analyzed. You can adapt the commands from the run_pipeline.sh script for this step.

Repository Structure

gldp_repository/
├── data/
│   ├── alanine_dipeptide/
│   │   └── README.md  (placeholder for latent reps)
│   ├── 7jfl_C/
│   │   └── README.md  (placeholder for latent reps)
│   ├── A1AR/
│   │   └── README.md  (placeholder for latent reps)
│   └── A2AR/
│       └── README.md  (placeholder for latent reps)
├── scripts/
│   ├── extract_res.py             # Preprocessing: Extracts heavy atoms and dihedrals
│   ├── train_autoencoder.py       # Trains the GNN Encoder-Decoder
│   ├── fit_koopman_model.py       # Koopman propagator
│   ├── train_neural_propagator.py # Neural Network propagator
│   ├── run_langevin_simulation.py # Langevin propagator (score-based)
│   ├── train_score_model.py       # Trains the diffusion/score model for Langevin
│   ├── decode_trajectory.py       # Decodes latent trajectories to 3D structures
│   ├── convert_h5_to_xtc.py       # Converts HDF5 coordinates to XTC format
│   ├── analyze_rmsd.py            # Performs final RMSD analysis
│   ├── run_pipeline.sh            # Main execution script
│   ├── install_dependencies.sh    # Sets up the environment
│   └── param_template.yaml        # Configuration template
└── README.md                      # This file

Workflow Steps

Preprocessing (extract_res.py): The pipeline begins by processing the input PDB and XTC files to extract heavy-atom coordinates and dihedral angle information, creating a heavy_chain.pdb and standardized JSON files.
Encoder-Decoder Training (train_autoencoder.py): A Graph Neural Network autoencoder is trained to learn a mapping between the 3D structure and a low-dimensional latent space.
Score Model Training (train_score_model.py): (For Langevin method only) A diffusion model is trained on the latent space embeddings to learn the score function, which approximates the energy gradient.
Latent Space Propagation: A new trajectory is generated in the latent space using one of the chosen methods:
- run_langevin_simulation.py (Langevin)
- fit_koopman_model.py (Koopman)
- train_neural_propagator.py (Neural Network)
Decoding & Analysis: The new latent trajectory is decoded back into all-atom 3D coordinates (decode_trajectory.py), converted to a standard format (convert_h5_to_xtc.py), and evaluated against the native trajectory (analyze_rmsd.py).

GLDP

Install / Use

README