Pose

A bare-metal Python library for building and manipulating protein and nucleic acid molecular structures

<img src="pose/Video1.gif" width="25%"/><img src="pose/Video2.gif" width="25%"/><img src="pose/Video3.gif" width="25%"/><img src="pose/Video4.gif" width="25%"/>

Video Tutorial

Watch the full walkthrough: Video Tutorial on YouTube

What is Pose?

Pose constructs a data structure for a protein or a nucleic acid molecule that contains all relevant information defining a polymer. Primary information includes the XYZ cartesian coordinates of each atom, the identity and charge of each atom, and the bond graph of the entire molecule. Secondary information includes the FASTA sequence, radius of gyration, potential energy, and the secondary structure assignment for each protein residue.

Using this data structure, Pose can build and manipulate polypeptides and nucleic acids: construct any polypeptide or nucleic acid from sequence, move dihedral and rotamer angles, mutate residues and base pairs, and measure bond lengths and angles. It is designed as a substrate for higher-level protocols such as simulated annealing, molecular dynamics, and machine learning-based molecular design.

Key features:

Designed to be extremly stable bare-metal python, with zero external dependencies beyond NumPy
26 amino acids supported by default (20 canonical + 6 non-canonical: LYX, MSE, PYL, SEC, TRF, TSO), can be extended to 100+
Support for both L-amino acids and D-amino acids (mixed sequences fully supported)
5 DNA and RNA canonical nucleotides
Full bond graph with partial charges
Measure and rotate protein dihedral angles (φ/ψ/ω/χ)
Measure and rotate nucleic acids dihedral angles (α/β/γ/δ/ε/ζ/χ)
Measure and adjust the distance and angle between any atoms
PDB and mmCIF file import and export
Pythonic zero-based indexing throughout (unlike PDB's one-based convention)

Installation

Dependencies: Python >= 3, NumPy

For virtualenv:

pip install git+https://github.com/sarisabban/Pose

For anaconda:

conda create -n ENVIRONMENT python=3
conda activate ENVIRONMENT
pip3 install git+https://github.com/sarisabban/Pose

Quick Start

from pose import *

# Build a peptide
p = Pose()
p.Build('GAL')       # Gly-Ala-Leu (uppercase = L-amino acids)
p.GetInfo()          # Print structured summary

# Inspect properties
print('Sequence:', p.data['FASTA'])
print('Mass:', p.data['Mass'], 'Da')
print('Rg:', p.data['Rg'], 'Å')

# Rotate backbone angles (indices are zero-based)
p.RotateDihedral(1, -60, 'PHI')
p.RotateDihedral(1, -45, 'PSI')

# Mutate and export
p.Mutate(2, 'V')        # Change residue at index 2 (Leu) → Val
p.Export('peptide.pdb')

# Import a protein
p = PoseN()
p.Import('1YN3.pdb')
p.GetInfo()

# Import a nucleic acid
p = PoseN()
p.Import('1BNA.pdb')
p.GetInfo()

D-amino acids — use lowercase letters: Uppercase sequence letters build L-amino acids (natural form). Lowercase builds D-amino acids (mirror images). Mixed sequences are fully supported.

p.Build('ACEG')   # All L-amino acids
p.Build('aceg')   # All D-amino acids
p.Build('GAg')    # G=L-Gly, A=L-Ala, g=D-Gly
p.Build('AcEg')   # Mixed L/D sequence

Importing a PDB file:

p = Pose()
p.Import('1TQG.pdb', chain='A')
p.ReBuild()     # Adds missing hydrogens, and calculates SASA, atomic partial charge, and amino acid secondary structures

You can run p.ReBuild() after Import() to add hydrogens to the structure. But understand that a new synthetic structure will be built, therefore you will lose the original occupancy and temperature-factor for each atom (replaces with 1.0 and 0.0).

API Reference

Building & I/O

| p.Build('MSLESNRGI')<br>p.Build('ATCG', fmt='DNA') | Build a polypeptide from a one-letter sequence. Uppercase = L-amino acids, lowercase = D-amino acids.<br>Build a nucleic acid, use fmt to choose DNA or RNA | | p.ReBuild() | Rebuild the polypeptide or nucleic acid. Best to use right after Import(). Use D_AA=True to rebuild entirely in D-amino acids. Will add missing hydrogens, calculate each atom's partial charge, as well as each amino acid's secondary structure | | p.Mutate(1, 'V') | Mutate a residue. Example: residue 1 → L-Valine. v = 1 → D-Valine |

| Method | Description | |------------------------------------------------------------|-------------| | p.Import(filename='1YN3.pdb', chain=['A', 'B'], model=1) | Imports a structure from a filename (PDB or CIF) format and constructs the p.data JSON object. Import() can import a protein, DNA, or RNA structure, a single chain or a list of chains, chain=None will import all chains. It can also choose which model to import from an ensemble of models. Cannot import a structure that is a mixtire of proteins and nucleic acids in seperate chains, best is to import each macromolecule type as a seperate pose | | p.Export('out.pdb') | Write the full structure, and all chains, to a PDB or mmCIF file |

Measurements

| Method | Description | |----------------------------------------|-------------| | p.GetDistance(0, 'N', 5, 'CA') | Get the distance in Å between any two atoms. Example: residue 0 nitrogen atom to residue 5 CA atom | | p.GetDihedral(2, 'PHI') | Calculate the amino acid φ/ψ/ω/χ and nucleotide α/β/γ/δ/ε/ζ/χ dihedral angles. In this example we are measuring the PHI angle of the 3rd protein residue (index 2). For protein χ dihedral use p.GetDihedral(4, 'chi', 1) 5th residue (index 4), CHI 1 angle | | p.GetAngle(0, 'N', 5, 'CA', 17, 'C') | Get the angle between any three atoms in the whole structure. Example: N of residue 1, CA of residue 5, and C angle of residue 17 | | p.GetAtomBonds(0, 1) | Confirm and get the PDB name and element name [atom 1 element name, atom 1 PDB name, atom 2 PDB name, atom 2 element name] for two atoms (if they are bonded together). Use the atom indeces. If the two atoms are not bonded an error will be raised | | p.GetIdentity(0, 'Atom') | Identify the PDB name of an atom, or an amino acid, or a nucleotide by its index. Example p.GetIdentity(5, 'Atom') or p.GetIdentity(5, 'amino acid') or p.GetIdentity(5, 'nucleotide'). Also, specifically just for atoms, you are return its partial charge using p.GetIdentity(3, 'Atom', charge=True) | | p.GetInfo() | Print a formatted summary of the structure's information | | p.GetAtomCoord(3, 'N') | Get the XYZ coordinates of an atom of a residue or a nucleotide (monomers). Example: N nitrogen of monomer index 3 | | p.GetAtomIdx(3, 'N') | Get the atom index inp.data['Coordinates']from it's name within a monomer. This is the opposite ofp.GetAtomCoord(3, 'N')| |p.GetAtomList(PDB=True) | Get a list of all atom element names for the entire structure. UsePDB=Truefor PDB-formatted names | |p.CalcMass() | Calculates the entire molecular mass of a molecule (all chains) in Da (Daltons), updates the value of p.data['Mass'] | |p.CalcSize() | Calculates the length of each chain in a structure, updates the value of p.data['Size']. You can get the length of each chain usingp.data['Size'][CHAIN]| |p.CalcFASTA() | Compiles the FASTA sequence of each chain, updates the value of p.data['FASTA']. You can get the FASTA sequence of each chain usingp.data['FASTA'][CHAIN]| |p.CalcRg() | Calculates the entire Radius of Gyration of a molecule (all chains) in Å (angstrom), updates the value of p.data['Rg'] | |p.CalcCharge() | Calculate the Gasteiger-Marsili partial charges to all atoms, updates the value of self.data['Atoms'][index][2] | |p.CalsDSSP() | Calculates each amino acid's secondary structure assignment, only for proteins, and store them inp.data['Amino Acids'][i][4]and updatedp.data['SS'][CHAIN], therefore this is where you can get the SS sequence of each chain | | p.CalcSASA() | Calculates the Solvent Accessible Surface Area (SASA) for each amino acid, only for proteins, and adds the value top.data['Amino Acids'][i][6]` |

Manipulation

| Method | Description | |---------------------------------------------|-------------| | p.MovePose(5, [18, 10, 5], 6, [0, 0, 0]) | Rotate and/or translate the whole structure. Example: rotate 5° degrees around axis [18, 10, 5] and move 6Å towards point [0, 0, 0] | | p.AdjustDistance(0, 'N', 4, 'C', 17) | Set the distance between any two atoms in (Å). Example: set the distance between N in residue 1 and C in residue to 17 Å. Order matters: (0,'N',0,'CA',d) ≠ (0,'CA',0,'N',d) | | p.AdjustAngle(1, 'N', 1, 'CA', 1, 'C', -2)| Add/subtract degrees from a three-atom angle, with atom 2 being the pivot point. Example: subtract 2° from N–CA–C angle of residue 1, with the CA atom being the pivot | | p.RotateDihedral(1, -60, 'PHI') | Rotate the amino acid φ/ψ/ω/χ and nucleotide α/β/γ/δ/ε/ζ/χ dihedral

Pose

Install / Use

README

Pose

<img src="pose/Video1.gif" width="25%"/><img src="pose/Video2.gif" width="25%"/><img src="pose/Video3.gif" width="25%"/><img src="pose/Video4.gif" width="25%"/>

Video Tutorial

What is Pose?

Installation

Quick Start

API Reference

Building & I/O

Measurements

Manipulation