Phold
Phage Annotation using Protein Structures
Install / Use
/learn @gbouras13/PholdREADME
phold - Phage Annotation using Protein Structures
<p align="center"> <img src="img/phold_logo.png" alt="phold Logo" height=250> </p>phold is a sensitive annotation tool for bacteriophage genomes and metagenomes using protein structural homology.
To learn more about phold, please read our manuscript:
https://academic.oup.com/nar/article/54/1/gkaf1448/8415830
Bouras G., Grigson S.R., Mirdita M., Heinzinger M., Papudeshi B.,
Mallawaarachchi V., Green R., Kim S.R., Mihalia V., Psaltis A.J.,
Wormald P-J., Vreugde S., Steinegger M., Edwards R.A.Protein Structure Informed Bacteriophage Genome Annotation with Phold
Nucleic Acids Research, Volume 54, Issue 1, 13 January 2026
https://doi.org/10.1093/nar/gkaf1448
phold uses the ProstT5 protein language model to rapidly translate protein amino acid sequences to the 3Di token alphabet used by Foldseek. Foldseek is then used to search these against a database of over 1.36 million phage protein structures mostly predicted using Colabfold.
Alternatively, you can specify protein structures that you have pre-computed for your phage(s) instead of using ProstT5 using the parameters --structures and --structure_dir with phold compare.
phold strongly outperforms sequence-based homology phage annotation tools like Pharokka, particularly for less characterised phages such as those from metagenomic datasets.
If you have already annotated your phage(s) with Pharokka, phold takes the Genbank output of Pharokka as an input option, so you can easily update the annotation with more functional predictions!
Tutorial
Check out the phold tutorial at https://phold.readthedocs.io/en/latest/tutorial/.
Google Colab Notebooks
If you don't want to install phold locally, you can run it without any code using one of the following Google Colab notebooks:
-
To run
pharokka+phold+phyntenyuse this link- phynteny uses phage synteny (the conserved gene order across phages) to assign hypothetical phage proteins to a PHROG category - it might help you add extra PHROG category annotations to hypothetical genes remaining after you run
phold.
- phynteny uses phage synteny (the conserved gene order across phages) to assign hypothetical phage proteins to a PHROG category - it might help you add extra PHROG category annotations to hypothetical genes remaining after you run
-
Pharokka, Phold and Phynteny are complimentary tools and when used together, they substantially increase the annotation rate of your phage genome
-
The below plot shows the annotation rate of different tools across 4 benchmarked datasets ((a) INPHARED 1419, (b) Cook, (c) Crass and (d) Tara - see the Phold preprint for more information)
-
The final Phynteny plots combine the benefits of annotation with Pharokka (with HMM, the second violin) followed by Phold (with structures, the fourth violin) followed by Phynteny
Phold plot Wasm App
- We recommending running the web app to generate
phold plotgenomic maps using WebAssembly (Wasm) in your broswer - no data ever leaves your machine! - Please go to https://gbouras13.github.io/phold-plot-wasm-app/ to use it
- You will need to first run Phold and upload the GenBank file via the button
- This was built during the WebAssembly workshop at ABACBS2025 - for more, you can find the website here
Recent Updates
v1.2.0 Update (8 January 2026)
- Improved ProstT5 3Di prediction throughput for
phold run,phold predictandphold proteins-predictdue to smarter batching implmentations - Addition of
phold autotunesubcommand to detect an appropriate--batch_sizefor your hardware - You can also use
--autotunewithphold run,phold predictandphold proteins-predictto automatically detect and use the optimal--batch_size(only recommended for large datasets with thousands of proteins)
Table of Contents
- phold - Phage Annotation using Protein Structures
- Tutorial
- Google Colab Notebooks
- Phold plot Wasm App
- Recent Updates
- Table of Contents
- Documentation
- Installation
- Quick Start
- Output
- Usage
- Plotting
- Citation
Documentation
Check out the full documentation at https://phold.readthedocs.io.
Installation
For more details (particularly if you are using a non-NVIDIA GPU), check out the installation documentation.
The best way to install phold is using conda via miniforge, as this will install Foldseek (the only non-Python dependency) along with the Python dependencies.
To install phold using conda:
conda create -n pholdENV -c conda-forge -c bioconda phold
To utilise phold with GPU, a GPU compatible version of pytorch must be installed. By default conda will install a CPU-only version.
If you have an NVIDIA GPU, please try:
conda create -n pholdENV -c conda-forge -c bioconda phold pytorch=*=cuda*
If you have a Mac running an Apple Silicon chip (M1/M2/M3/M4), phold should be able to use the GPU. Please try:
conda create -n pholdENV python==3.13
conda activate pholdENV
conda install pytorch::pytorch torchvision torchaudio -c pytorch
conda install -c conda-forge -c bioconda phold
If you are have a different non-NVIDIA GPU, or have trouble with pytorch, see this link for more instructions. If you have an older version of CUDA installed, then you might find this link useful.
Once phold is installed, to download and install the database run:
phold install -t 8
If you have an NVIDIA GPU and can take advantage of Foldseek's GPU acceleration, instead run
phold install -t 8 --foldseek_gpu
- Note: You will need at least 8GB of free space (the
pholddatabases including ProstT5 are just over 8GB uncompressed).
Quick Start
pholdtakes a GenBank format file output from pharokka or from NCBI Genbank as its input by default.- If you are running
pholdon a local work station with GPU available, usingphold runis recommended. It runs bothphold predictandphold compare
phold run -i tests/test_data/NC_043029.gbk -o test_output_phold -t 8
-
If you have an NVIDIA GPU available, add
--foldseek_gpu -
If you do not have any GPU available, add
--cpu. -
phold runwill run in a reasonable time for small datasets with CPU only (e.g. <5 minutes for a 50kbp phage). With GPU it should complete in under 1 minute. -
phold predictwill complete much faster if a GPU is available, and is necessary for large metagenomic datasets to run in a reasonable time. -
In a cluster environment where GPUs are scarce, for large datasets it may be most efficient to run
pholdin 2 steps for optimal resource usage.
- Predict the 3Di sequences with ProstT5 using
phold predict. This is massively accelerated if a GPU available.
phold predict -i tests/test_data/NC_043029.gbk -o test_predictions
- Compare the the 3Di sequences to the
pholdstructure database with Foldseek usingphold compare. This does not utilise a GPU.
phold compare -i tests/test_data/NC_043029.gbk --predictions_dir test_predictions -o test_output_phold -t 8
Output
- The primary outputs are:
phold_3di.fastacontaining the 3Di sequences for each CDSphold_per_cds_predictions.tsvcontaining detailed annotation information on every CDSphold_all_cds_functions.tsvcontaining counts per contig of CDS in each PHROGs category, VFDB, CARD, ACRDB and Defensefinder databases (similar to thepharokka_cds_functions.tsvfrom Pharokka)phold.gbk, which contains a GenBank format file including these annotations, and keeps any other genomic features (tRNA, CRISPR repeats, tmRNAs) included from thepharokkaGenbank input file if provided
Usage
Usage: phold [OPTIONS] COMMAND [ARGS]...
Options:
-h, --help Show this message and exit.
-V, --version Show the version and exit.
Commands:
autotune Determines optimal batch size for 3Di prediction with
citation Print the citation(s) for this tool
compare Runs Foldseek vs phold db
createdb Creates foldseek DB from AA FASTA and 3Di FASTA input...
insta
