SkillAgentSearch skills...

Pdbremix

analyse PDB files, run molecular-dynamics & analyse trajectories

Install / Use

/learn @boscoh/Pdbremix
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

title: pdbremix documentation


python version 2 not 3

pdbremix

pdbremix is a library to analyze protein structures and protein simulations

The library consists of:

  1. tools to analyze and view PDB structures
  2. tools to run MD simulations and analyze MD trajectories
  3. python interface to analyze PDB structures
  4. python interface for MD simulations and MD trajectories

An interactive version of this readme.md is here.

Installation

Download from the github repo:

    github/pdbremix/zip

Or browse the repo:

    github/pdbremix

And then install:

> python setup.py install

From here, you can access unit tests and example scripts.

There are many wonderful tools in structural biology that have less-than-stellar interfaces. pdbremix wraps these tools to make them easier to use.

all the commands are stored in the bin directory,

for linux, I suggest you add the bin path to the PATH variable, then use the follow command.

To check which tools can be accessed from the path:

> checkpdbremix

Use the -o flag to get the binary config file to override (with exotic flags):

> vi `checkpdbremix -o`

For windows, change directory to bin ; > python checkpdbremix

Tools to analyze PDB structures

pdbremix is a library to analyze PDB structures and MD trajectories. As such, it provides a platform to build command-line tools for PDB files as well as to carry out useful pre-processing of PDB files for external tools.

For all tools, detailed help is available with the -h flag, and many of the scripts work with pypy for significant speed-ups.

Tools in Pure Python

Some of the tools can be used straight out of the box:

  • pdbfetch fetches PDB files from the RCSB website
  • pdbheader displays summary of PDB files
  • pdbseq displays sequences in a PDB
  • pdbchain extracts chains from a PDB
  • pdbcheck checks for common defects in a PDB
  • pdbstrip cleans up PDB for MD simulations

The following tools implement standard algorithms:

  • pdbvol calculates volume of a PDB
  • pdbasa calculates accessible surface-area of a PDB
  • pdbrmsd calculates RMSD between PDB files

For these tools, you get an impressive speed-up if use use pypy:

> pdbfetch 1be9
> pdbstrip 1be9.pdb
> pypy `which pdbvol` 1be9.pdb

Wrappers around External Tools

These following tools wrap external tools to solve some very common (and painful) use-cases in PDB analysis.

  • pdbshow displays PDB structures in PYMOL with extras.

    PYMOL is a powerful viewer, but it's defaults leave a little to be desired. pdbshow runs PYMOL with useful defaults and added functionality:

    - By default, shows colored chains, ribbons, and sidechains as sticks. 
    - Define initial viewing frame by a center-residue and a top-residue. Structure is rotated to place the center-residue above the center-of-mass in the middle, and the top-residue above the center-residue.
    - Color by B-factor using a red-white scale, with limits.
    
    • Worm mode to show B-factor by variable width.
      • Solvent molecules can be removed, specifically for MD frames that contain too many waters, which will choke PYMOL.
  • pdboverlay display homologous PDB files using MAFFT, THESEUS and PYMOL.

    One of the most beautiful results of structural biology is the structural alignment of homologous proteins. pdboverlay performs this complex process in one easy step starting from PDB structures:

    • Write fasta sequences from PDB.
    • Align sequences with MAFTT to find homologous regions.
    • Structurally align homologous regions with THESEUS.
    • Display structurally-aligned PDBs using special PYMOL script.
  • pdbinsert fill gaps in PDB with MODELLER

    Gaps in PDB structures cause terrible problems in MD simulations. The standard tool to patch gaps is MODELLER, which requires a ton of boilerplate. pdbinsert does all the dirty work with MODELLER in one fell stroke.

Tools to run MD Simulation

pdbremix provides a simplified cross-package interface to run a useful subset of molecular-dynamics simulations. Of course, this is in not a replacement for the full functionality of these packages.

For beginners, it is particularly useful to see how a simulation is set-up from a PDB file to a trajectory, as the shell scripts and log files of all intermediate steps are saved to file. It is easier to modify a working process than to generate one from scratch.

Preparing Simulations from PDB

First let's grab a PDB file from the website:

> pdbfetch 1be9

Then we can clean do some standard cleanup so that the structure exists in a unique single conformation:

> pdbstrip 1be9.pdb

This next tool interrogates the structure for features that may affect MD simulation, highlighting steric clashes, chain-breaks (missing amino acids), disulfide bonds, incomplete and nonstandard amino acids:

> pdbcheck 1be9.pdb

Then we generate a topology file from the PDB file:

> pdb2sim 1be9.pdb sim AMBER11-GBSA

This will detect multiple chains, disulfide-bonds, fit hydrogen atoms to AMBER, and guess polar residue charged states. Masses, charges and bond spring parameters are generated from the AMBER99 force-field. pdb2sim will write a set of restart files with a common basename sim:

sim.top - the toplogy file
sim.crd - the coordinates file

The current choice of force-fields:

  1. AMBER11-GBSA
  2. AMBER11
  3. NAMD2.8
  4. GROMACS4.5

For AMBER11-GBSA, pdb2sim builds a topology file for implicit solvent. For the other choices, explicit solvent is used, where pdb2sim creates a box with 10 Å padding, and fills the box with waters and counterions.

Positional constraints

Positional constraints are very important in setting up MD simulations. pdbremix simplifies the application of positional restraints by using the B-factor column of PDB files to denote positional constaints, which is what NAMD does.

To generate a PDB file for positional restraints from a set of restart files:

> sim2pdb -b sim sim.restraint.pdb

which will generate a PDB file where all backbone atoms have been selected. You can directly edit the B-factors in the PDB file. Another option -a is for all protein atoms:

> sim2pdb -a sim sim.restraint.pdb

Running simulations

pdbremix provide several tools to run MD simulations where the chosen package is detected by the extensions of the restart files.

For all packages, a robust set of simulation parameters are used, including a 1 fs time-step, and no bond-constraints on protein atoms. In explicit solvent, periodic boundary conditions are applied with PME electrostatics.

The output restart files and trajectories are written to a common basename, and an optional -r flag to load positional restraints:

  1. Minimize your structure from sim restart files to min, using restraints defined in sim.restraint.pdb:

     > simmin -r sim.restraint.pdb sim min
    
  2. MD simulation with a Langevin thermometer at 300K for 5000 fs:

     > simtemp -r restraint.pdb min temp 300 5000
    
  3. For constant energy for 5000 fs:

     > simconst -r restraint.pdb min const 5000
    

This allows you to run equilibration protocols from the command-line. For instance, a prequilibration at 300K, intially a 10 ps heating of the solvent, followed by 10 ps of the system:

> sim2pdb -b restraint.pdb sim
> simmin -r restraint.pdb sim min
> simtemp -r restraint.pdb min heat1 10000 300
> simtemp heat1 heat2 10000 300

Trajectory analysis

pdbremix provides a tool to calculate RMSD and kinetic energy for trajectories, convenience tools for viewing trajectories in viewers, and some translation tools. To use these tools, the trajectory files must have the following naming structure:

  • AMBER:
    • md.top
    • md.trj
    • md.vel.trj
  • GROMACS:
    • md.top (and md.*itp)
    • md.gro
    • md.trr
  • NAMD:
    • md.psf
    • md.dcd
    • md.vel.dcd

These are trajectory analysis tools:

  • trajstep displays basic parameters of a trajectory
  • trajvar calculates energy and RMSD of trajectory

As opening trajectories in standard viewers are a pain, use these tools to open them:

  • trajvmd display trajectory in VMD *recommended*
  • trajchim display trajectory in CHIMERA
  • trajpym display trajectory in PYMOL *AMBER only*

And some package specific tools:

  • traj2amb converts NAMD/GROMACS to AMBER trajectories *without* solvent
  • grotrim trim GROMACS .trr trajectory files

Python interface to PDB structures

An important part of pdbremix is the design of a light API to interact with PDB structures. The data structures are designed to be easy to use with idomatic Python to do things such as select atoms.

Other packages sometimes include a domain-specific language for atom selection, but ultimately this limits the ability for those libraries to interact with the Python ecosystem such as scipy, pandas, or numpy.

Vector geometry library

As in any structural biology library, pdbremix proivdes a full-featured vector geometry library v3:

from pdbremix import v3

v3 was designed to be function-based, which allows the library to switch between a pure Python version and a numpy-dependent version.

If you want just the python version:

import pdbremix.v3array as v3

Or the numpy version:

import pdbremix.v3numpy as v3

Vectors are created and copied by the vector function:

v = v3.vector() # the zero vector
z = v3.vector(1,2,3)
w = v3.vector(z) # a copy

Vectors are represented as arrays as they are subclassed from Python arrays or numpy arrays, and components are accessed as:

print v[0], v[1], v[2]

All vectors functions return by value, with the one exception of set_vector, which changes components in place:

v3.set_vector(v, 2, 2, 2)

Here are a set of common vector operations:

mag(v)
scale(v, s)
dot(v1, v2)
cross(v1, v2)
norm(v)
parallel(v, axis)
pe

Related Skills

View on GitHub
GitHub Stars62
CategoryDevelopment
Updated4mo ago
Forks26

Languages

Python

Security Score

92/100

Audited on Oct 31, 2025

No findings