About
Installation
- conda python environment
- Singularity image
Usage
About the name
Additional data
Software used
License
How to cite
Funding
Contact

About

AnnapuRNA is a knowledge-based scoring function designed to evaluate RNA-ligand complex structures, generated by any computational docking method.

scheme

Installation

conda python environment

Recommended way of AnnapuRNA installation and running is via conda environment under Linux 64 bit (extensively tested on Ubuntu).

Install miniconda. Please refer to conda manual and install conda version according to your operating system. Please use Python2 version (miniconda2).
Clone AnnapuRNA repository: git clone --depth=1 git@github.com:filipsPL/annapurna.git or fetch a zip package.
Go to the AnnapuRNA directory (typically cd annapurna under linux) and restore the conda environment from the yml file conda env create -f conda-environment.yml (the complete AnnapuRNA conda environment needs ~1.5 GB of free disk space).

Tests

To validate the installation and run tests, please execute annapurna-tests.sh.

Uninstallation

(if you no longer need the AnnapuRNA)

Remove the directory with the AnnapuRNA code
remove conda environment: conda remove --name annapurna --all.
To verify that the environment was removed, in your terminal window run conda info --envs

Tested environments

AnnapuRNA was extensively tested under Linux with Ubuntu versions 16.04, 18.04, and 20.04, with latest miniconda2 Miniconda2-py27_4.8.3-Linux-x86_64.sh.

Singularity image

Singularity image with the AnnapuRNA fast version (containing fast kNN and RF scoring functions) is available in the sylabs cloud: cloud.sylabs.io.

To fetch the latest image directly, run:

singularity pull library://filips/default/annapurna:latest

Usage

Quick start

Sample input files from molecular docking are located in tests/testFiles/: 1AJU.pdb - the RNA structure and ARG.sdf - poses from docking.

conda activate annapurna

mkdir testresults
./annapurna.py -r tests/testFiles/1AJU.pdb -l tests/testFiles/ARG.sdf -m kNN_modern -o testresults/output --groupby

Output files:

Table with scores: testresults/output.kNN_modern.csv (scores for all poses) and testresults/output.kNN_modern.grouped.csv (best score for each compound from the input file). The AnnapuRNA score is in the last column ("score"). The lower value, the better.

Singularity image

Usage of AnnapuRNA in singularity container is the same as the standalone console version. Please note that the container has a fast version of the scoring function implemented, i.e., kNN and RF. For DL scoring functions, please use the regular version.

singularity exec annapurna.sif annapurna.py -r tests/testFiles/1AJU.pdb -l tests/testFiles/ARG.sdf -m kNN_modern -o testresults/output --groupby

AnnapuRNA in action

# commands used in the screen cast
conda activate annapurna
./annapurna.py --help
mkdir testresults
./annapurna.py -r tests/testFiles/1AJU.pdb -l tests/testFiles/ARG.sdf -m kNN_modern -o testresults/output --groupby
cd testresults
ls -la
column -t output.kNN_modern.grouped.csv
column -t output.kNN_modern.csv | less

AnnapuRNA in Jupyter Notebook

To see or run AnnapuRNA in jupyter-notebook, refer to the sample notebook (please note that this is a notebook with a bash kernel).

Usage

Input files

RNA

PDB format is mandatory, with nucleotide letters assigned to atoms, eg:

ATOM     64  H1    G A  17      -5.322  17.506   1.537  1.00  0.00           H
ATOM     65  H21   G A  17      -5.499  17.205   3.712  1.00  0.00           H
ATOM     66  H22   G A  17      -4.319  17.828   4.843  1.00  0.00           H
ATOM     67  P     C A  18       3.269  18.622   4.974  1.00  0.00           P
ATOM     68  OP1   C A  18       3.196  20.073   5.282  1.00  0.00           O
ATOM     69  OP2   C A  18       4.574  17.923   5.091  1.00  0.00           O
ATOM     70  O5'   C A  18       2.219  17.861   5.902  1.00  0.00           O

pdb files fetched from the Protein Data Bank should be fine.

Ligand poses

AnnapuRNA accepts many file formats, such as sdf, mol2, mol, pdb, or any other understood by the OpenBabel. Extensively tested on sdf files.

Remarks:

If your input file contains more than one compound (i.e., chemical compound with unique structure), please make sure that each of compounds has an unique name/title.
Please make sure that the ligands have the desired protonation state.

Scoring models

:warning: Please note, that for using Deep Learning models (ie. 'DL_basic', 'DL_modern') you should run a H2O engine in another window, by issuing the command ./start_h2o.sh.

AnnapuRNA was benchmarked on four different models: 'DL_basic', 'DL_modern', 'kNN_basic', and 'kNN_modern'.

kNN_modern should be a good first shot:

./annapurna.py -r tests/testFiles/1AJU.pdb -l tests/testFiles/ARG.sdf -m kNN_modern -o testresults/output --groupby

Please note, that you can specify scoring with more than one models in a single run:

./annapurna.py -r tests/testFiles/1AJU.pdb -l tests/testFiles/ARG.sdf -m kNN_basic -m kNN_modern -o testresults/output --groupby

or even all available models:

./annapurna.py -r tests/testFiles/1AJU.pdb -l tests/testFiles/ARG.sdf -m ALL -o testresults/output --groupby

Please pay attention to the optional argument --merge - which merges predictions from multiple models into a single file.

In addition to those four models, we provide two models of interactions: NB_modern (Naive Bayes) and RF_modern (Random Forests), both trained on 2016 data set (but please note, that the performance wasn't thoroughly tested).

Clustering

The clustering of poses is optional and is based on the RMSD distance matrix. We implemented three clustering algorithms that take the RMSD distance matrix as an input, namely "AutoDock-like" method (as implemented in the AutoDock/AutoDock Vina) - AD, "SimRNA-like" method (as implemented in ROSETTA/SimRNA programs) - SR, and Affinity Propagation method (AP).

There are three switches defining clustering parameters:

choosing a clustering method:

--clustering_method {False,AD,SR,AP}
                      Clustering method. AD = AutoDock-like; SR = SimRNA-
                      like; AP = Affinity Propagation.

defining, how many of top scoring poses will be taken for clustering. 1 = all poses, 0.5 = 50% of the best poses etc.:

--cluster_fraction CLUSTERINGFRACTION
                      Docking poses clustering. Select this fraction of top
                      scoring poses. 0-1. 0 = do not cluster results

for AD = AutoDock-like and SR = SimRNA-like clustering methods, define a clustering cut off. 2 Å should be a reasonable starting point.

--cluster_cutoff CLUSTERINGCUTOFF
                      Docking poses clustering. Use this RMSD cutoff for
                      clustering. 0 = do not use the RMSD cutoff

For examples, go to the Usage examples section.

For fine-tuning the Affinity Propagation method, go to the Program fine-tuning section.

Other options

-o OUTPUTFILENAME - define the output file name core, eg., -o testresults/output will generate results in testresults dir, with names starting with output.
-s, --skip_statistics - if, for

Annapurna

Install / Use

README

About

Installation

conda python environment

Tests

Uninstallation

Tested environments

Singularity image

Usage

Quick start

Singularity image

AnnapuRNA in action

AnnapuRNA in Jupyter Notebook

Usage

Input files

RNA

Ligand poses

Scoring models

Clustering

Other options