SkillAgentSearch skills...

WaterNES

Workflows to calculate relative free energies using non-equilibrium switching for buried water molecules

Install / Use

/learn @MobleyLab/WaterNES
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

waterNES

⚠️ Development Status

  • main → stable, reproducible workflows used in older publications
  • sw_devel → active development, new features, ongoing changes published in a recent paper

If you are using this code for research or reproduction, use main.

If you want the latest updates:

git checkout sw_devel

This repository serves as the main repository for the following two open access publications:

  1. Advancing Binding Affinity Calculations: A Non-Equilibrium Simulations Approach for Calculation of Relative Binding Free Energies in Systems with Trapped Waters; Swapnil Wagle, Christopher Bayly and David Mobley; ChemRxiv, 2025, DOI: 10.26434/chemrxiv-2025-xtjl3 (https://chemrxiv.org/engage/chemrxiv/article-details/67d3ea816dde43c908c21ba4)

  2. Leveraging a Separation of States Method for Relative Binding Free Energy Calculations in Systems with Trapped Waters; Swapnil Wagle, Pascal T. Merz, Yunhui Ge, Christopher I. Bayly and David L. Mobley; J. Chem. Theory Comput. 2024, 20, 11013−11031 (https://pubs.acs.org/doi/10.1021/acs.jctc.4c01145)

This repository contains workflows to:

  1. calculate relative free energies (RBFEs) between ligand pairs with different numbers of trapped water molecules and implementing a non-equilibrium switching (NES)-based workflow, using thermodynamic cycle Thermodynamic cycle

  2. calculate RBFEs between ligands that bind to target protein with different numbers of trapped water molecules. The RBFE is calculated using the following thermodynamic cycle Thermodynamic cycle

  3. calculate absolute binding free energies (ABFEs) of trapped waters in proteins/protein-ligand complexes using the following thermodynamic cycle Thermodynamic cycle

The purple cycle includes additional restraints for the protein, and is preferred when it is likely that the binding site might collapse once the water is removed. In all other cases, the simpler blue cycle can be used.

Cycle branches

  • A, the target branch: We will find its value by summing all other branches.
  • B, water restraints correction for interacting water: Calculated from equilibrium simulations of its end points (stages 1 and 2, simulations of the protein interacting with the unrestrained and restrained water, respectively).
  • C, water coupling / decoupling: Calculated using multiple NES simulations from each end point to the other (stages 2 and 3, simulations of the protein with and without interactions with the restrained water). The starting structures are obtained from long equilibrium simulations at the end points.
  • D, water restraints correction for non-interacting water: This value can be calculated analytically, no simulations required.
  • E, transfer of non-interacting water to bulk water: This branch does not involve any free energy change, $\Delta G_{E} = 0$.
  • F, turn on interactions of water molecule in bulk water: This process is only depending on the water model, so it can be simulated once and reused for every system which uses the same water model. This is currently not part of the workflow.

Stages

  • Stage 1: This simulates the protein complex with an unrestrained, fully interacting water molecule. Simulations include minimization, equilibration in NVT and NPT, production simulation in NPT. During the production run, the free energy with respect to the water restraint (which is turned off) is calculated.
  • Stage 2: This simulates the protein complex with a restrained, fully interacting water molecule. Simulations include minimization, equilibration in NVT and NPT, production simulation in NPT. During the production run, the free energy with respect to the water restraint (which is turned on) is calculated. Characteristic structures from the production simulation are used to start NES simulations (transition stage 2 to 3).
  • Stage 3: This simulates the protein complex with a restrained, non-interacting water molecule. Simulations include minimization, equilibration in NVT and NPT, production simulation in NPT. Characteristic structures from the production simulation are used to start NES simulations (transition stage 3 to 2).
  • Stages 4 and 5: Coming soon.

Systems

Currently, this repo contains input files for five proteins, each in combination with one to three different ligands. The protein-ligand complexes are named after their PDB ID. They include

  • HIV-1 protease: 1HPX, 1EC0, 1EBW
  • Trypsin: 1AZ8, 1C5T, 1GI1
  • Scytalone dehydratase[1]: 3STD, 4STD, 7STD
  • Factor Xa: 1EZQ, 1LPG, 1F0S
  • BPTI: 5PTI

[1]: 3STD has a different water site than 4STD and 7STD

Usage

Run simulations

All simulations can be run using the bash file scripts/run_simulations.sh. There is an integrated documentation which can be called by running the script with the -h flag:

$ bash scripts/run_simulations.sh -h
Run Water NES simulations.

USAGE: run_simulations.sh [-h] -d dir -t dir [-c crd] -x gmx [-o opts] -s N -p {min,eqNVT,eqNPT,prod,NES} [-n N]
Options:
-h      Print this help and exit
-d dir  The base working directory for the stage
-t top  The topology directory including topology files
-c crd  The input coordinates
        Mandatory for the minimization phase, ignored otherwise
-x gmx  The GROMACS executable to use
-o opts The run options, e.g. "-nt 16 -ntomp 4", optional
-s N    The stage to run, one of:
        1: Stage with fully interacting, non-restrained water
        2: Stage with fully interacting, restrained water
        3: Stage with non-interacting, restrained water
-p {min,eqNVT,eqNPT,prod,NES}
        The simulation phase to run, one or more of: min, eqNVT, eqNPT, prod, NES
        Separate different phases by spaces, e.g. -p "min eqNVT eqNPT"
        Note: The NES phase is only available for stages 2 and 3
-n N    The run number for the NES phase
        Mandatory for the NES phase, ignored otherwise

In the above USAGE string, flags in brackets ([]) are optional, options in braces ({}) denote possible values, and all other flags are mandatory.

The directory (-d) is the base directory. All simulations input and output is going to be written in subdirectories of this base directory. The only requirement for this directory is full write access. All files within the directory might be overwritten.

The topology directory (-t) needs to contain topology files for the system to be simulated. Required files are the main topology named as system.top, as well as files containing $\lambda$-dependent and $\lambda$-independent position restraints for the water, named waterRestraintLambdaIndependent.top and waterRestraintLambdaDependent.top, respectively.

The coordinate file (-c) needs to contain starting coordinates for the system matching with the topology found in the topology directory. This is only needed for the minimization phase.

The GROMACS executable to use (-x) can either be a path to an executable, or simply gmx if it is in the path.

The run options (-o) are only used for the call to gmx mdrun, and allow to set parallelization details or other run time options accepted by gmx mdrun.

The stage (-s) denotes one of the stages of the thermodynamic cycle as defined above.

The simulation phase (-p) defines the simulation to be run, one of min (minimization of the input configuration), eqNVT (thermalization at constant volume), eqNPT (equilibration at constant temperature and pressure), prod (production simulation in NPT), and NES (non-equilibrium transition simulations, only for stages 2 and 3). Each phase builds on the previous phase. Several phases can be combined. Note that the NES phase requires that the starting structures were prepared from the result of the prod phase, so it cannot be directly chained.

For the NES phase, the run number needs to be given with the flag -n. Each run starts from a different configuration, extracted from the production simulation with the scripts/prepare_nes_structures.sh script. The -n flag is ignored for all other phases.

Prepare configurations for NES runs

NES simulations need to be run from multiple starting structures representative of the ensemble of configurations in the end states. To achieve this, we extract configurations from the long NPT production simulations in stages 2 and 3. This operation can be performed using the scripts/prepare_nes_structures.sh script. Again, a help function describes the functionality:

Prepare structures for NES simulations

USAGE: prepare_nes_structures.sh [-h] -d dir -x gmx -n num
Options:
-h      Print this help and exit
-d dir  The base working directory of the stage
        This directory is expected to have a subdirectory prod/
        with the results of the production simulation of that stage
-x gmx  The GROMACS executable to use
-n num  The number of structures to prepare

In the above USAGE string, flags in brackets ([]) are optional, while all other flags are mandatory.

The directory (-d) is the base directory of the stage in which all simulation input and output was written, and should be the same directory that was used for the -d flag of the run_simulations.sh script. Specifically, the script is looking for the mdp file and the trajectory file in the prod/ folder under this base directory.

The GROMACS executable to use (-x) can either be a path to an executable, or simply gmx if it is in the path.

Finally, -n denotes the number of structures to prepare. This defines how many structures are extracted from the production simulation for further use.

Analysis

Edge B

Analysis of the equilibrium endpoint free energy differe

Related Skills

View on GitHub
GitHub Stars4
CategoryDevelopment
Updated16d ago
Forks0

Languages

Python

Security Score

85/100

Audited on Mar 22, 2026

No findings