Pharokka
fast phage annotation program
Install / Use
/learn @gbouras13/PharokkaREADME
pharokka
<p align="center"> <img src="img/pharokka_logo.png" alt="pharokka Logo" height=400> </p>Extra special thanks to Ghais Houtak for making Pharokka's logo.
Fast Phage Annotation Tool
pharokka is a rapid standardised annotation tool for bacteriophage genomes and metagenomes.
If you are looking for rapid standardised annotation of bacterial genomes, please use Bakta. Prokka, which inspired the creation & naming of pharokka, is another good option, but Bakta is Prokka's worthy successor.
phold
If you like pharokka, you will probably love phold. phold uses structural homology to improve phage annotation. Benchmarking is ongoing but phold strongly outperforms pharokka in terms of annotation, particularly for less characterised phages such as those from metagenomic datasets.
pharokka still has features phold lacks for now (identifying tRNA, tmRNA, CRISPR repeats, and INPHARED taxonomy search), so it is recommended to run phold after running pharokka.
phold takes the Genbank output of Pharokka as input. Therefore, if you have already annotated your phage(s) with Pharokka, you can easily update the annotation with more functional predictions with phold.
Google Colab Notebook
If you don't want to install pharokka or phold locally, you can run pharokka and phold (and phynteny), or only pharokka, without any code using the Google Colab notebook.
-
phyntenyuses phage synteny (the conserved gene order across phages) to assign hypothetical phage proteins to a PHROG category - it might help you add extra PHROG category annotations to hypothetical genes remaining after you run phold. -
Pharokka, Phold and Phynteny are complimentary tools and when used together, they substantially increase the annotation rate of your phage genome
-
The below plot shows the annotation rate of different tools across 4 benchmarked datasets ((a) INPHARED 1419, (b) Cook, (c) Crass and (d) Tara - see the Phold preprint for more information)
-
The final Phynteny plots combine the benefits of annotation with Pharokka (with HMM, the second violin) followed by Phold (with structures, the fourth violin) followed by Phynteny
Phold plot Wasm App
- We recommending running the web app to generate genomic maps using WebAssembly (Wasm) in your broswer - no data ever leaves your machine!
- Please go to https://gbouras13.github.io/phold-plot-wasm-app/ to use it
- Note: while this was designed for Phold, is also works for Pharokka output!
- You will need to first run Pharokka and upload the GenBank file via the button.
- This was built during the WebAssembly workshop at ABACBS2025 - for more, you can find the website here
Table of Contents
- pharokka
- phold
- Google Colab Notebook
- Phold plot Wasm App
- Table of Contents
- Quick Start
- Documentation
- Paper
- Pharokka with Galaxy Europe Webserver
- Brief Overview
- Installation
- Database Installation
- Beginner Conda Installation
- Usage
- Version Log
- System
- Time
- GenBank submission
- Benchmarking v1.5.0
- Benchmarking v1.4.0
- Original Benchmarking (v1.1.0)
- Bugs and Suggestions
- Citation
Quick Start
The easiest way to install pharokka is via conda:
conda install -c bioconda pharokka
Followed by database download and installation:
install_databases.py -o <path/to/databse_dir>
And finally annotation:
pharokka.py -i <phage fasta file> -o <output directory> -d <path/to/database_dir> -t <threads>
As of pharokka v1.4.0, if you want extremely rapid PHROG annotations, use --fast:
pharokka.py -i <phage fasta file> -o <output directory> -d <path/to/database_dir> -t <threads> --fast
Documentation
Check out the full documentation at https://pharokka.readthedocs.io.
Paper
pharokka has been published in Bioinformatics:
George Bouras, Roshan Nepal, Ghais Houtak, Alkis James Psaltis, Peter-John Wormald, Sarah Vreugde, Pharokka: a fast scalable bacteriophage annotation tool, Bioinformatics, Volume 39, Issue 1, January 2023, btac776, https://doi.org/10.1093/bioinformatics/btac776.
If you use pharokka, please see the full Citation section for a list of all programs pharokka uses, in order to fully recognise the creators of these tools for their work.
Pharokka with Galaxy Europe Webserver
Thanks to some amazing assistance from Paul Zierep, you can run pharokka using the Galaxy Europe webserver. There is no plotting functionality at the moment.
So if you can't get pharokka to install on your machine for whatever reason or want a GUI to annotate your phage(s), please give it a go there.
Brief Overview
<p align="center"> <img src="img/pharokka_workflow.png" alt="pharokka Workflow" height=600> </p>pharokka uses PHANOTATE, the only gene prediction program tailored to bacteriophages, as the default program for gene prediction. Prodigal implemented with pyrodigal and Prodigal-gv implemented with pyrodigal-gv are also available as alternatives. Following this, functional annotations are assigned by matching each predicted coding sequence (CDS) to the PHROGs, CARD and VFDB databases using MMseqs2. As of v1.4.0, pharokka will also match each CDS to the PHROGs database using more sensitive Hidden Markov Models using PyHMMER. Pharokka's main output is a GFF file suitable for using in downstream pangenomic pipelines like Roary. pharokka also generates a cds_functions.tsv file, which includes counts of CDSs, tRNAs, tmRNAs, CRISPRs and functions assigned to CDSs according to the PHROGs database. See the full usage and check out the full documentation for more details.
Pharokka v 1.9.0 Update (12 January 2026)
- Adds
pyrodigal-rv(see https://github.com/LanderDC/pyrodigal-rv) dependency as a gene predictor option that may be useful if you are annotating RNA phages (also RNA viruses generally perhaps, although YMMV)- Use
-g pyrodigal-rvto use this
- Use
- Fixes bug with incorrect translation table being passed when using
-g prodigaland meta mode (usually for single phages, where they are too small to have a Prodigal model trained for them) - see https://github.com/gbouras13/pharokka/issues/409- We
