Annapurna
AnnapuRNA: a scoring function for predicting RNA-small molecule interactions.
Install / Use
/learn @filipsPL/AnnapurnaREADME

- About
- Installation
- Usage
- About the name
- Additional data
- Software used
- License
- How to cite
- Funding
- Contact
About
AnnapuRNA is a knowledge-based scoring function designed to evaluate RNA-ligand complex structures, generated by any computational docking method.

Installation
conda python environment
Recommended way of AnnapuRNA installation and running is via conda environment under Linux 64 bit (extensively tested on Ubuntu).
- Install miniconda. Please refer to conda manual and install conda version according to your operating system. Please use Python2 version (miniconda2).
- Clone AnnapuRNA repository:
git clone --depth=1 git@github.com:filipsPL/annapurna.gitor fetch a zip package. - Go to the AnnapuRNA directory (typically
cd annapurnaunder linux) and restore the conda environment from the yml fileconda env create -f conda-environment.yml(the complete AnnapuRNA conda environment needs ~1.5 GB of free disk space).
Tests
To validate the installation and run tests, please execute annapurna-tests.sh.
Uninstallation
(if you no longer need the AnnapuRNA)
- Remove the directory with the AnnapuRNA code
- remove conda environment:
conda remove --name annapurna --all. - To verify that the environment was removed, in your terminal window run
conda info --envs
Tested environments
AnnapuRNA was extensively tested under Linux with Ubuntu versions 16.04, 18.04, and 20.04, with latest miniconda2 Miniconda2-py27_4.8.3-Linux-x86_64.sh.
Singularity image
Singularity image with the AnnapuRNA fast version (containing fast kNN and RF scoring functions) is available in the sylabs cloud: cloud.sylabs.io.
To fetch the latest image directly, run:
singularity pull library://filips/default/annapurna:latest
Usage
Quick start
Sample input files from molecular docking are located in tests/testFiles/: 1AJU.pdb - the RNA structure and ARG.sdf - poses from docking.
conda activate annapurna
mkdir testresults
./annapurna.py -r tests/testFiles/1AJU.pdb -l tests/testFiles/ARG.sdf -m kNN_modern -o testresults/output --groupby
Output files:
- Table with scores:
testresults/output.kNN_modern.csv(scores for all poses) andtestresults/output.kNN_modern.grouped.csv(best score for each compound from the input file). The AnnapuRNA score is in the last column ("score"). The lower value, the better.
Singularity image
Usage of AnnapuRNA in singularity container is the same as the standalone console version. Please note that the container has a fast version of the scoring function implemented, i.e., kNN and RF. For DL scoring functions, please use the regular version.
singularity exec annapurna.sif annapurna.py -r tests/testFiles/1AJU.pdb -l tests/testFiles/ARG.sdf -m kNN_modern -o testresults/output --groupby
AnnapuRNA in action
# commands used in the screen cast
conda activate annapurna
./annapurna.py --help
mkdir testresults
./annapurna.py -r tests/testFiles/1AJU.pdb -l tests/testFiles/ARG.sdf -m kNN_modern -o testresults/output --groupby
cd testresults
ls -la
column -t output.kNN_modern.grouped.csv
column -t output.kNN_modern.csv | less
AnnapuRNA in Jupyter Notebook
To see or run AnnapuRNA in jupyter-notebook, refer to the sample notebook (please note that this is a notebook with a bash kernel).
Usage
Input files
RNA
PDB format is mandatory, with nucleotide letters assigned to atoms, eg:
ATOM 64 H1 G A 17 -5.322 17.506 1.537 1.00 0.00 H
ATOM 65 H21 G A 17 -5.499 17.205 3.712 1.00 0.00 H
ATOM 66 H22 G A 17 -4.319 17.828 4.843 1.00 0.00 H
ATOM 67 P C A 18 3.269 18.622 4.974 1.00 0.00 P
ATOM 68 OP1 C A 18 3.196 20.073 5.282 1.00 0.00 O
ATOM 69 OP2 C A 18 4.574 17.923 5.091 1.00 0.00 O
ATOM 70 O5' C A 18 2.219 17.861 5.902 1.00 0.00 O
pdb files fetched from the Protein Data Bank should be fine.
Ligand poses
AnnapuRNA accepts many file formats, such as sdf, mol2, mol, pdb, or any other understood by the OpenBabel. Extensively tested on sdf files.
Remarks:
- If your input file contains more than one compound (i.e., chemical compound with unique structure), please make sure that each of compounds has an unique name/title.
- Please make sure that the ligands have the desired protonation state.
Scoring models
:warning: Please note, that for using Deep Learning models (ie. 'DL_basic', 'DL_modern') you should run a H2O engine in another window, by issuing the command ./start_h2o.sh.
AnnapuRNA was benchmarked on four different models: 'DL_basic', 'DL_modern', 'kNN_basic', and 'kNN_modern'.
kNN_modern should be a good first shot:
./annapurna.py -r tests/testFiles/1AJU.pdb -l tests/testFiles/ARG.sdf -m kNN_modern -o testresults/output --groupby
Please note, that you can specify scoring with more than one models in a single run:
./annapurna.py -r tests/testFiles/1AJU.pdb -l tests/testFiles/ARG.sdf -m kNN_basic -m kNN_modern -o testresults/output --groupby
or even all available models:
./annapurna.py -r tests/testFiles/1AJU.pdb -l tests/testFiles/ARG.sdf -m ALL -o testresults/output --groupby
Please pay attention to the optional argument --merge - which merges predictions from multiple models into a single file.
In addition to those four models, we provide two models of interactions: NB_modern (Naive Bayes) and RF_modern (Random Forests), both trained on 2016 data set (but please note, that the performance wasn't thoroughly tested).
Clustering
The clustering of poses is optional and is based on the RMSD distance matrix. We implemented three clustering algorithms that take the RMSD distance matrix as an input, namely "AutoDock-like" method (as implemented in the AutoDock/AutoDock Vina) - AD, "SimRNA-like" method (as implemented in ROSETTA/SimRNA programs) - SR, and Affinity Propagation method (AP).
There are three switches defining clustering parameters:
- choosing a clustering method:
--clustering_method {False,AD,SR,AP}
Clustering method. AD = AutoDock-like; SR = SimRNA-
like; AP = Affinity Propagation.
- defining, how many of top scoring poses will be taken for clustering. 1 = all poses, 0.5 = 50% of the best poses etc.:
--cluster_fraction CLUSTERINGFRACTION
Docking poses clustering. Select this fraction of top
scoring poses. 0-1. 0 = do not cluster results
- for AD = AutoDock-like and SR = SimRNA-like clustering methods, define a clustering cut off. 2 Å should be a reasonable starting point.
--cluster_cutoff CLUSTERINGCUTOFF
Docking poses clustering. Use this RMSD cutoff for
clustering. 0 = do not use the RMSD cutoff
For examples, go to the Usage examples section.
For fine-tuning the Affinity Propagation method, go to the Program fine-tuning section.
Other options
-o OUTPUTFILENAME- define the output file name core, eg.,-o testresults/outputwill generate results intestresultsdir, with names starting withoutput.-s, --skip_statistics- if, for
