PyRMSD
pyRMSD is a small Python package that aims to offer an integrative and efficient way of performing RMSD calculations of large sets of structures. It is specially tuned to do fast collective RMSD calculations, as pairwise RMSD matrices.
Install / Use
/learn @victor-gil-sepulveda/PyRMSDREADME
pyRMSD
pyRMSD goal is the fast (and easy!) calculation of rmsd collective operations, specially matrices of large ensembles of protein conformations. It also offers a symmetric distance matrix implementation with improved access speed and memory efficiency.
If you like it and you are using it for your scientific projects, please cite the pyRMSD paper:
Bioinformatics (2013) 29 (18): 2363-2364.
doi: 10.1093/bioinformatics/btt402
pyRMSD distributed under MIT license, and it is currently on its version 4.0 .
Summary
- 1 - Features
- 2 - Usage
- 3 - Building & Installation
- 4 - The custom building script
- 5 - Testing (Developers)
- 6 - Benchmarks (Developers)
- Future improvements
- Credits
##1 - Features ###Collective operations pyRMSD currently has 5 basic operations: 1 - Pairwise RMSD calculation 2 - One vs. following (of a sequence of conformers). 3 - One vs. all the other conformations (of a sequence of conformers). 4 - Pairwise RMSD matrix 5 - Iterative superposition of a sequence.
All methods can use the same coordinates for fitting and RMSD calculation, or a different set of coordinates for fitting (superposing) and calculating RMSD (referred into the code as 'calculation coordinates' ).
Currently pyRMSD implements a total of 3 superposition algorithms (Kabsch's,QTRFIT and QCP) which can have serial or parallel versions (OpenMP and CUDA in one case).
The available calculators so far are:
- KABSCH_SERIAL_CALCULATOR
- KABSCH_OMP_CALCULATOR
- QTRFIT_SERIAL_CALCULATOR
- QTRFIT_OMP_CALCULATOR
- QCP_SERIAL_CALCULATOR
- QCP_OMP_CALCULATOR
- QCP_CUDA_CALCULATOR (in CUDA capable machines*)
- QCP_CUDA_MEM_CALCULATOR (in CUDA capable machines*)
In addition it offers 2 other calculators that do not perform superposition (for cases in which the parts of interest of the system are already superposed):
- NOSUP_SERIAL_CALCULATOR
- NOSUP_OMP_CALCULATOR This calculator will also center the coordinates, adding a little unnecessary overhead. This overhead will be totally diluted when calculating RMSD matrices though.
Finally it also holds a hidden calculator, QCP_SERIAL_FLOAT_CALCULATOR, maninly used to test against QCP_CUDA_CALCULATOR in its float version.
Methods 1, 2 and 3 can be used to modify the input coordinates (the input coordinates will be superposed). The iterative superposition method will always have this behaviour as it would be senseless otherwise. Conversely, RMSD matrix will never modify input coordinates.
pyRMSD can also have fitting symmetries and rotational calculation symmetries into account. Documentation about this is on its way.
If you think you need new features to be added (or better examples) click here. * Computing capability of the GPU must be equal or higher than 1.1 (>1.2 if built with double precision support). ###Condensed matrix pyRMSD contains also a C written data type called CondensedMatrix. This is a representation of a squared symmetric matrix and it will save you half of the, otherwise redundant, memory. Besides, its write and read access outperforms other implementations like pure python's list-based and even Cython implementations (see the benchmarks folder). This means that it will speed up for free any application that heavily relies on accessing a distance matrix, like clustering algorithms. See the examples below to get more insight about how to use it. ##2 - Usage Some code snippets and explanations about them will be shown below. Note that, as the code changes rapidly, this snippets can be outdated. I will put all my effort for this not to happen, but if you detect that the code examples are being more problematic than helpful for you, please contact me. You will also find method and variables documentation in the code. Do not hesitate to ask for more documentation if you find is missing.
###Getting coordinates To use the module the first thing will be to extract all the coordinates from a PDB file. Coordinates must be stored in numpy arrays, using the same layout that Prody uses:
Coordset: [Conformation 1, Conformation 2, ..., Conformation N]
Conformation: [Atom 1, Atom 2,..., Atom M]
Atom: [x,y,z]
In order to do this there's a convenience class function in pyRMSD/utils/proteinReading.py called Reader. This will read a pdb file using the built in reader. Despite this, we encourage the use of Prody if you need to do any kind of selection/manipulation.
from pyRMSD.utils.proteinReading import Reader
reader = Reader().readThisFile("my_trajectory.pdb").gettingOnlyCAs()
coordinates = reader.read()
num_of_atoms = reader.numberOfAtoms
num_of_frames = reader.numberOfFrames
See 'pyRMSD/pyRMSD/test/testPdbReader.py for a simple usage example. ###Calculating the RMSD matrix To calculate the RMSD matrix you can use a MatrixHandler or use directly one calculator object to feed a CondensedMatrix.
Using MatrixHandler to get the RMSD pairwise matrix (given that we already have read the coordinates) will look like this:
from pyRMSD.matrixHandler import MatrixHandler
rmsd_matrix = MatrixHandler()\
.createMatrix(coordinates, 'QCP_OMP_CALCULATOR')
Calculating the matrix using directly the RMSDCalculator is a little bit more verbose:
import pyRMSD.RMSDCalculator
calculator = pyRMSD.RMSDCalculator.\
RMSDCalculator(coordsets,\
"QCP_SERIAL_CALCULATOR")
rmsd = calculator.pairwiseRMSDMatrix()
rmsd_matrix = CondensedMatrix(rmsd)
As the resulting matrix is symmetric and its diagonal is 0, the rmsd_matrix object will store only the upper diagonal triangle (condensed matrix), in the same way scipy.spatial.distance.pdist does. ###Available calculators Programatically, available calculators can be queried with:
from pyRMSD.availableCalculators import availableCalculators
print availableCalculators()
###Matrix handlers A MatrixHandler object will help you to create the matrix and will also help you saving and loading matrix data to disk.
from pyRMSD.matrixHandler import MatrixHandler
# Create a matrix with the coordsets and using a calculator
mHandler = MatrixHandler()
matrix = mHandler.createMatrix( coordsets,\
"QCP_CUDA_CALCULATOR")
# Save the matrix to 'to_this_file.bin'
m_handler.saveMatrix("to_this_file")
# Load it from 'from_this_file.bin'
mHandler.loadMatrix("from_this_file")
# Get the inner CondensedMatrix instance
rmsd_matrix = mHandler.getMatrix()
###Accessing the RMSD matrix You can access a matrix object contents like this:
rmsd_at_pos_2_3 = rmsd_matrix[2,3]
The row_lenght parameter will give you the... row length. Remember that the matrix is square and symmetric, so row_length == column_length, rmsd_matrix[i,j] == rmsd_matrix[j,i] and as it is a distance matrix, rmsd_matrix[i,i] == 0.
One can also access the inner representation of the data (a numpy array) using the get_data( ) function. Ex.
inner_data = rmsd_matrix.get_data()
The inner_data array will contain only the elements of the matrix upper triangle (diagonal not included), in row-major order.
for example, the matrix:
0 1 2 3 4
1 0 5 6 7
2 5 0 8 9
3 6 8 0 1
4 7 9 1 0
Will be retrieved as:
[1, 2, 3, 4, 5, 6, 7, 8, 9, 1]
It is possible to use scipy.spatial.distance.squareform in order to recover the initial matrix layout.
###Matrix statistics The CondensedMatrix class also offers an efficient way to ask for the most common statistical moments. Use the methods calculateMean, calculateVariance, calculateSkewness and calculateKurtosis to get mean, variance, skewness and kurtosis ( easy, isn't it :) ). You can also use calculateMax and calculateMin to get the maximum and minimum value of the matrix.
##3 - Building & Installation ###Before installation Users only need to install Python version 2.6/2.7 (pyRMSD has only been tested with those, however it may work with another versions of the Python 2.X family). Numpy is also required. Surely you already have it into your machine, but, in the case you don't, it can be found here. There you will be able to find installers for almost all the combinations of platforms and Python versions you can think about.
Developers may remember that header files of Python and Numpy may be accessible, and your Python installation must contain the python shared library. This usually means that you have to compile it using ./configure --enable-shared before building Python (usually 2.7 distributions already come with this library). Prody is not a dependency, but I encourage its use to handle coordinates, as it is well-tested and powerful tool.
###Linux and MacOs
Those users have the following choices:
1) Using the 'setup.py' file inside the root folder by typing:
Related Skills
pestel-analysis
Analyze political, economic, social, technological, environmental, and legal forces
next
A beautifully designed, floating Pomodoro timer that respects your workspace.
roadmap
A beautifully designed, floating Pomodoro timer that respects your workspace.
product-manager-skills
50PM skill for Claude Code, Codex, Cursor, and Windsurf: diagnose SaaS metrics, critique PRDs, plan roadmaps, run discovery, and coach PM career transitions.
