GLDP
Official Python implementation for the paper accepted at ICLR 2026 'Beyond Ensembles: Simulating All-Atom Protein Dynamics in a Learned Latent Space'.
Install / Use
/learn @adityasengar/GLDPREADME
Graph Latent Dynamics Propagator (GLDP)
This repository contains the official implementation for the paper "Beyond Ensembles: Simulating All-Atom Protein Dynamics in a Learned Latent Space". We introduce the Graph Latent Dynamics Propagator (GLDP), a modular framework for simulating long-timescale protein dynamics.
The core idea is to use a pre-trained Graph Neural Network to encode high-dimensional, all-atom protein structures into a simplified latent space. Within this space, we can efficiently simulate the system's temporal evolution using one of several "propagator" methods. The resulting latent trajectory is then decoded back into all-atom structures.
Visual Overview
The GLDP framework follows a modular encoder-propagator-decoder pipeline:
- Encoder: A pre-trained ChebNet GNN maps all-atom coordinates to a low-dimensional latent vector
z(t). - Propagator: Advances the state in the latent space (
z(t) -> z(t+1)) using one of three methods:- Score-Guided Langevin Dynamics: A physics-informed stochastic simulation.
- Koopman Operator: A linear, data-driven model.
- Autoregressive Neural Network: A non-linear, expressive neural network.
- Decoder: A pre-trained network maps the new latent vector
z(t+1)back to all-atom coordinates.
Installation
A virtual environment is strongly recommended. The install_dependencies.sh script automates the setup of a venv and installs all required packages at their tested versions.
# Navigate to the scripts directory
cd gldp_repository/scripts
# Run the installation script
./install_dependencies.sh
# Activate the environment to run the pipeline
source venv/bin/activate
Usage
The run_pipeline.sh script automates the entire workflow for the Koopman and Neural Network propagators. It handles data preprocessing, model training, simulation, and analysis in a single command.
Command:
./run_pipeline.sh <path_to_pdb> <path_to_xtc> <method>
Arguments:
<path_to_pdb>: Path to your input PDB file.<path_to_xtc>: Path to your input XTC trajectory file.<method>: The propagation method to use. Choose from:koopman: For the Koopman operator.neural: For the autoregressive neural network.
Example:
# Assuming PDB/XTC files are in the current directory
./run_pipeline.sh 7jfl_C.pdb 7jfl_C_prod_R1_fit.xtc koopman
Each run creates a unique, timestamped output directory (e.g., run_20250915_210144/) containing all logs, models, and generated data. The final analysis, which identifies the best-generated trajectory, will be printed to pipeline_run.log inside that directory.
Langevin Dynamics Simulation (Manual Steps)
Running the score-guided Langevin dynamics simulation is a manual, multi-step process that requires running the scripts individually after the initial setup.
Prerequisite: You must first have a pooled_embedding.h5 file. You can generate this by running the automated pipeline (e.g., with the koopman method) and letting it complete Step 3 (the train_autoencoder.py script). The file will be in the latent_reps/ subdirectory of the run.
Step 1: Train the Score Model
Use train_score_model.py to train the diffusion model on the latent embeddings. This model is necessary to calculate the score (effective force) for the Langevin simulation.
python train_score_model.py \
--h5_file path/to/your/run_*/latent_reps/pooled_embedding.h5 \
--output_model_path path/to/your/run_*/checkpoints/score_model.pth \
--epochs 50000
Step 2: Run the Langevin Simulation
Use run_langevin_simulation.py to perform the simulation in the latent space, using the trained score model from the previous step.
python run_langevin_simulation.py \
--score_model_path path/to/your/run_*/checkpoints/score_model.pth \
--h5_file path/to/your/run_*/latent_reps/pooled_embedding.h5 \
--output_file path/to/your/run_*/latent_reps/langevin_rollout.h5 \
--num_steps 100000
Step 3: Decode the Latent Trajectory
Finally, use decode_trajectory.py and convert_h5_to_xtc.py to decode the generated latent trajectory (langevin_rollout.h5) back into an all-atom XTC file, which can then be analyzed. You can adapt the commands from the run_pipeline.sh script for this step.
Repository Structure
gldp_repository/
├── data/
│ ├── alanine_dipeptide/
│ │ └── README.md (placeholder for latent reps)
│ ├── 7jfl_C/
│ │ └── README.md (placeholder for latent reps)
│ ├── A1AR/
│ │ └── README.md (placeholder for latent reps)
│ └── A2AR/
│ └── README.md (placeholder for latent reps)
├── scripts/
│ ├── extract_res.py # Preprocessing: Extracts heavy atoms and dihedrals
│ ├── train_autoencoder.py # Trains the GNN Encoder-Decoder
│ ├── fit_koopman_model.py # Koopman propagator
│ ├── train_neural_propagator.py # Neural Network propagator
│ ├── run_langevin_simulation.py # Langevin propagator (score-based)
│ ├── train_score_model.py # Trains the diffusion/score model for Langevin
│ ├── decode_trajectory.py # Decodes latent trajectories to 3D structures
│ ├── convert_h5_to_xtc.py # Converts HDF5 coordinates to XTC format
│ ├── analyze_rmsd.py # Performs final RMSD analysis
│ ├── run_pipeline.sh # Main execution script
│ ├── install_dependencies.sh # Sets up the environment
│ └── param_template.yaml # Configuration template
└── README.md # This file
Workflow Steps
- Preprocessing (
extract_res.py): The pipeline begins by processing the input PDB and XTC files to extract heavy-atom coordinates and dihedral angle information, creating aheavy_chain.pdband standardized JSON files. - Encoder-Decoder Training (
train_autoencoder.py): A Graph Neural Network autoencoder is trained to learn a mapping between the 3D structure and a low-dimensional latent space. - Score Model Training (
train_score_model.py): (For Langevin method only) A diffusion model is trained on the latent space embeddings to learn the score function, which approximates the energy gradient. - Latent Space Propagation: A new trajectory is generated in the latent space using one of the chosen methods:
run_langevin_simulation.py(Langevin)fit_koopman_model.py(Koopman)train_neural_propagator.py(Neural Network)
- Decoding & Analysis: The new latent trajectory is decoded back into all-atom 3D coordinates (
decode_trajectory.py), converted to a standard format (convert_h5_to_xtc.py), and evaluated against the native trajectory (analyze_rmsd.py).

