Project description

This is an application to analyze base pairing patterns of DNA/RNA 3D structures to find and classify tetrads and quadruplexes. ElTetrado assigns tetrads to one of the ONZ classes (O, N, Z) alongside with the directionality of the tetrad (+/-) determined by the bonds between bases and their non-canonical interactions. The interactions follow Leontis/Westhof classification (Leontis et al. 2001). Watson-Crick (W) edge of first base in the tetrad structure exposed to the Hoogsteen (H) edge of the next nucleobase from the same tetrad sets the tetrad directionality, clockwise (+) or anticlockwise (-). For more details, please refer to Zok et al. (2020) and Popenda et al. (2020)

Installation

This project uses Poetry for dependency management.

To install the package, run:

poetry install

Dependencies

The project is written in Python 3.12+ and requires mmcif, orjson, NumPy and rnapolis.

Visualization is created by R 3.6+ script which uses R4RNA (Lai et al. 2012) library. The dependency will be automatically installed if not present.

Base pairs and stacking interactions are identified by RNApolis.

Usage

ElTetrado is a command line application, which requires to be provided with --input and a path to a PDB or PDBx/mmCIF file.

By default, ElTetrado outputs textual results on the standard output. A JSON version of the output can be obtained with --output switch followed by a path where the file is supposed to be created.

ElTetrado prepares visualization of the whole structure and of each N4-helices, quadruplexes and tetrads. This can be supplemented with canonical base pairs visualization when --complete-2d is set. All color settings are located in the first several lines of the quadraw.R file, you can easily change them without knowledge of R language. If you want ElTetrado to not visualize anything, pass --no-image switch to it.

usage: eltetrado [-h] [-i INPUT] [-o OUTPUT] [-m MODEL] [--no-reorder]
                 [--complete-2d] [--image DIR] [-e [EXTERNAL_FILES ...]]
                 [--tool {fr3d,dssr,rnaview,bpnet,maxit,barnaba,mc-annotate}]
                 [-v]

options:
  -h, --help            show this help message and exit
  -i, --input INPUT     path to input PDB or PDBx/mmCIF file
  -o, --output OUTPUT   (optional) path for output JSON file
  -m, --model MODEL     (optional) model number to process
  --no-reorder          chains of bi- and tetramolecular quadruplexes should
                        be reordered to be able to have them classified; when
                        this is set, chains will be processed in original
                        order, which for bi-/tetramolecular means that they
                        will likely be misclassified; use with care!
  --complete-2d         when set, the visualization will also show canonical
                        base pairs to provide context for the quadruplex
  --image DIR           directory where visualization files (PDF) will be
                        saved; if omitted, no images are generated
  -e, --external-files [EXTERNAL_FILES ...]
                        path(s) to external tool output file(s); if omitted
                        ElTetrado will compute interactions itself
  --tool {fr3d,dssr,rnaview,bpnet,maxit,barnaba,mc-annotate}
                        name of the external tool that produced the files
                        (auto-detected when not provided)
  -v, --version         show program's version number and exit

Chains reorder

ElTetrado keeps a global and unique 5’-3’ index for every nucleotide which is independent from residue numbers. For example, if a structure has chain M with 60 nucleotides and chain N with 15 nucleotides, then ElTetrado will keep index between 0 and 74 which uniquely identifies every nucleotide. Initially, ElTetrado assigns this indices according to the order of chains in the input file. Therefore, if M preceded N then nucleotides in M will be indexed from 0 to 59 and in N from 60 to 74. Otherwise, nucleotides in N will be indexed from 0 to 14 and in M from 15 to 74.

When --no-reorder is present, this initial assignment is used. Otherwise, ElTetrado exhaustively checks all permutations of chains’ orders. Every permutation check induces recalculation of the global and unique 5’-3’ index and in effect it changes ONZ classification of tetrads.

ElTetrado keeps a table of tetrad classification scores according to these rules:

Type preference: O > N > Z
Direction preference: + > -

The table keeps low values for preferred classes i.e. O+ is 0, O- is 1 and so on up to Z- with score 5. For every permutation of chain orders, ElTetrado computes sum of scores for tetrads classification induced by 5’-3’ indexing. We select permutation with the minimum value.

Examples

2HY9: Human telomere DNA quadruplex structure in K+ solution hybrid-1 form

$ curl ftp://ftp.wwpdb.org/pub/pdb/data/structures/divided/mmCIF/my/2hy9.cif.gz | gzip -d > 2hy9.cif
$ ./eltetrado --input 2hy9.cif --output 2hy9.json

Chain order: 1
n4-helix with 3 tetrads
  Oh* V 9a -(pll) quadruplex with 3 tetrads
    1.DG4 1.DG22 1.DG18 1.DG10 cWH cWH cWH cWH O- Vb planarity=0.06  
      direction=hybrid rise=3.15 twist=28.48
    1.DG5 1.DG23 1.DG17 1.DG11 cHW cHW cHW cHW O+ Va planarity=0.05  
      direction=hybrid rise=3.08 twist=29.27
    1.DG6 1.DG24 1.DG16 1.DG12 cHW cHW cHW cHW O+ Va planarity=0.05  

    Tracts:
      1.DG4, 1.DG5, 1.DG6
      1.DG22, 1.DG23, 1.DG24
      1.DG18, 1.DG17, 1.DG16
      1.DG10, 1.DG11, 1.DG12

    Loops:
      propeller- 1.DT7, 1.DT8, 1.DA9
      lateral- 1.DT13, 1.DT14, 1.DA15
      lateral+ 1.DT19, 1.DT20, 1.DA21

AAAGGGTTAGGGTTAGGGTTAGGGAA
...(([...{)]...[[}...)]]..
...([{...)((...))(...)]}..

Click to see the output JSON

</summary>

{
  "metals": [],
  "nucleotides": [
    {
      "index": 1,
      "chain": "1",
      "number": 1,
      "icode": null,
      "molecule": "DNA",
      "fullName": "1.DA1",
      "shortName": "A",
      "chi": 22.30828283085781,
      "glycosidicBond": "syn"
    },
    {
      "index": 2,
      "chain": "1",
      "number": 2,
      "icode": null,
      "molecule": "DNA",
      "fullName": "1.DA2",
      "shortName": "A",
      "chi": -123.05454402191421,
      "glycosidicBond": "anti"
    },
    {
      "index": 3,
      "chain": "1",
      "number": 3,
      "icode": null,
      "molecule": "DNA",
      "fullName": "1.DA3",
      "shortName": "A",
      "chi": -94.96579955603106,
      "glycosidicBond": "anti"
    },
    {
      "index": 4,
      "chain": "1",
      "number": 4,
      "icode": null,
      "molecule": "DNA",
      "fullName": "1.DG4",
      "shortName": "G",
      "chi": 79.28363721639316,
      "glycosidicBond": "syn"
    },
    {
      "index": 5,
      "chain": "1",
      "number": 5,
      "icode": null,
      "molecule": "DNA",
      "fullName": "1.DG5",
      "shortName": "G",
      "chi": -126.01709201555563,
      "glycosidicBond": "anti"
    },
    {
      "index": 6,
      "chain": "1",
      "number": 6,
      "icode": null,
      "molecule": "DNA",
      "fullName": "1.DG6",
      "shortName": "G",
      "chi": -127.26656202302102,
      "glycosidicBond": "anti"
    },
    {
      "index": 7,
      "chain": "1",
      "number": 7,
      "icode": null,
      "molecule": "DNA",
      "fullName": "1.DT7",
      "shortName": "T",
      "chi": -63.10830751967371,
      "glycosidicBond": "anti"
    },
    {
      "index": 8,
      "chain": "1",
      "number": 8,
      "icode": null,
      "molecule": "DNA",
      "fullName": "1.DT8",
      "shortName": "T",
      "chi": -138.79520345559828,
      "glycosidicBond": "anti"
    },
    {
      "index": 9,
      "chain": "1",
      "number": 9,
      "icode": null,
      "molecule": "DNA",
      "fullName": "1.DA9",
      "shortName": "A",
      "chi": -148.83990757408878,
      "glycosidicBond": "anti"
    },
    {
      "index": 10,
      "chain": "1",
      "number": 10,
      "icode": null,
      "molecule": "DNA",
      "fullName": "1.DG10",
      "shortName": "G",
      "chi": 58.77875250191579,
      "glycosidicBond": "syn"
    },
    {
      "index": 11,
      "chain": "1",
      "number": 11,
      "icode": null,
      "molecule": "DNA",
      "fullName": "1.DG11",
      "shortName": "G",
      "chi": -123.85746807924986,
      "glycosidicBond": "anti"
    },
    {
      "index": 12,
      "chain": "1",
      "number": 12,
      "icode": null,
      "molecule": "DNA",
      "fullName": "1.DG12",
      "shortName": "G",
      "chi": -84.36679807284759,
      "glycosidicBond": "anti"
    },
    {
      "index": 13,
      "chain": "1",
      "number": 13,
      "icode": null,
      "molecule": "DNA",
      "fullName": "1.DT13",
      "shortName": "T",
      "chi": -30.819029132834157,
      "glycosidicBond": "anti"
    },
    {
      "index": 14,
      "chain": "1",
      "number": 14,
      "icode": null,
      "molecule": "DNA",
      "fullName": "1.DT14",
      "shortName": "T",
      "chi": -168.51776782812965,
      "glycosidicBond": "anti"
    },
    {
      "index": 15,
      "chain": "1",
      "number": 15,
      "icode": null,
      "molecule": "DNA",
      "fullName": "1.DA15",
      "shortName": "A",
      "chi": -105.72881577106517,
      "glycosidicBond": "anti"
    },
    {
      "index": 16,
      "chain": "1",
      "number": 16,
      "icode": null,
      "molecule": "DNA",
      "fullName": "1.DG16",
      "shortName":

Eltetrado

Install / Use

README