SkillAgentSearch skills...

Sphae

Phage annotations and predictions. A spae is a prediction or foretelling. We'll foretell you what your phage is doing!

Install / Use

/learn @linsalrob/Sphae
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

Edwards Lab DOI License: MIT

GitHub language count GitHub last commit (branch) CI

install with pip Pip Downloads install with bioconda Bioconda Downloads Docker Pulls

Sphae

Phage toolkit to detect phage candidates for phage therapy

<p align="center"> <img src="logo/sphae.png#gh-light-mode-only" width="300"> <img src="logo/sphaedark.png#gh-dark-mode-only" width="300"> </p>

Overview

The steps that sphae takes are shown here:

<p align="center"> <img src="logo/sphae_steps.png#gh-light-mode-only" width="300"> </p>

This snakemake workflow was built using Snaketool [https://doi.org/10.1371/journal.pcbi.1010705], to assemble and annotate phage sequences. Currently, this tool is being developed for phage genomes. The steps include,

  • Quality control that removes adaptor sequences, low-quality reads and host contamination (optional).
  • Assembly
  • Contig quality checks; read coverage, viral or not, completeness, and assembly graph components.
  • Phage genome annotation

Cite Sphae: https://doi.org/10.1093/bioadv/vbaf004

If you are new to bioinformatics or running command line tools, here is a great tutorial to follow: https://github.com/AnitaTarasenko/sphae/wiki/Sphae-tutorial

Install

Pip install

#creating a new envrionment
conda create -y -n sphae python=3.13
conda activate sphae
#install sphae 
pip install sphae

Conda install

Setting up a new conda environment

conda create -n sphae python=3.13
conda activate sphae

Container Install There are two versions of the container

  1. Sphae v1.5.2 Includes databases, so the container is about 32GB

    Steps to donwload and run this container

     TMPDIR=<where your tmpdir lives>
     IMAGEDIR-<where you want the image to live>
     
     singularity pull --tmpdir $TMPDIR --dir $IMAGEDIR docker://npbhavya/sphae:latest
     singularity exec sphae_latest.sif sphae --help
     singularity exec sphae_latest.sif sphae run --help
    
    
     singularity exec -B <path/to/inputfiles>:/input,<path/to/output>:/output sphae_latest.sif sphae run --input /input --output /output
    
  2. Sphae v1.5.2-noDB This version, doesnt come with databases. So the first step would be download the databases locally and save them to one directory <path/to/databases>.

    Here are the commands to download sphae container

    TMPDIR=<where your tmpdir lives>
    IMAGEDIR-<where you want the image to live>
    
    singularity pull --tmpdir $TMPDIR --dir $IMAGEDIR docker://npbhavya/sphae:v1.5.2-noDB
    singularity exec sphae_v1.5.2-noDB.sif sphae --help
    singularity exec sphae_v1.5.2-noDB.sif sphae run --help
    
    # <path/to/databases> set to sphae/workflow/databases if sphae install is run 
    singularity exec -B <path/to/databases>:/database,<path/to/inputfiles>:/input,<path/to/output>:/output sphae_latest.sif sphae run --input /input --output /output
    

Source install

#clone sphae repository
git clone https://github.com/linsalrob/sphae.git

#move to sphae folder
cd sphae

#install sphae
pip install -e .

#confirm the workflow is installed by running the below command 
sphae --help

Installing databases

Run the below command,

#Installs the database to default directory, `sphae/workflow/databases`
sphae install

#Install database to specific directory
sphae install --db_dir <directory> 

Install the databases to a directory, sphae/workflow/databases

This workflow requires the

  • Pfam35.0 database to run viral_verify for contig classification.
  • CheckV database to test for phage completeness
  • Pharokka databases
  • Phynteny models
  • Phold databases
  • Medaka models
  • PhageTermvirome-4.3 install

This step requires ~23G of storage If these databases are already installed, skip this step and instead set the envrionment variables pointing to the where these databases are installed

#Note to change the file path to the databases.
#For instance if sphae was installed using conda, the databases by default will be saved to /home/username/miniforge3/envs/sphae/lib/python3.11/site-packages/sphae/workflow/databases

export VVDB=sphae/workflow/databases/Pfam35.0/Pfam-A.hmm.gz
export CHECKVDB=sphae/workflow/databases/checkv-db-v1.5
export PHAROKKADB=sphae/workflow/databases/pharokka_db
export PHYNTENYDB=sphae/workflow/databases/models
export PHOLDDB=sphae/workflow/databases/phold

Running the workflow

Sphae is developed to be modular:

  • sphae run will run QC, assembly and annotation
  • sphae annotate will run only annotation steps

Commands to run

Only one command needs to be submitted to run all the above steps: QC, assembly and assembly stats

#For illumina reads, place the reads both forward and reverse reads to one directory
#Make sure the fastq reads are saved as {sample_name}_R1.fastq and {sample_name}_R2.fastq or with extensions {sample_name}_R1.fastq.gz
sphae run --input tests/data/illumina-subset --output example -k

#For nanopore reads, place the reads, one file per sample in a directory
sphae run --input tests/data/nanopore-subset --sequencing longread --output example -k

#For newer ONT sequencing data where polishing is not required, run the command
sphae run --input tests/data/nanopore-subset --sequencing longread --output example -k --no_medaka

#To run either of the commands on the cluster, add --executor slurm to the command. There is a little bit of setup to do here.
#Setup a ~/.config/snakemake/slurm/config.yaml file - https://snakemake.github.io/snakemake-plugin-catalog/plugins/executor/slurm.html#advanced-resource-specifications
#I may have set this workflow to run only slurm right now, will make it more generic soon.
sphae run --input tests/data/nanopore-subset --preprocess longread --output example --profile slurm -k --threads 16

Command to run only annotation steps and phylogenetic trees This step reruns

  • Pharokka, Phold, Phynteny
  • Phylogenetic tree with terminase large subunit, portal protein
#the genomes directory has the already assembled complete genomes
#run the export commands to set the database paths 
sphae annotate --genome <genomes directory> --output example -k

Output

Output for sphae run, is saved to example/RESULTS directory. In this directory, there will be four files

  • Genome annotations in GenBank format (Phynteny output)
  • Genome in fasta format (either the reoriented to terminase output from Pharokka, or assembled viral contigs)
  • Circular visualization in png format (Pharokka output)
  • Genome summary file
  • trees folder; Not this folder might be meaningful only if you have tailed phages
  • all_portal.nwk: Tree using all proteins annotated as "portal protein:
  • all_terL.nwk: Tree using all proteins annotated as "terminase large subunit"
  • PhageTerm results saved to a directory - <sample name>_phageterm (only for paired end sequencing)

Genome summary file includes the following information to help,

  • Sample name
  • Length of the genome
  • Coding density
  • If the assembled contig is circular or not (From checkv)
  • Completeness (calculated from CheckV)
  • Contamination (calculated from CheckV)
  • Taxonomy accession ID (Pharokka output, searches the genome against INPHARED database using mash)
  • Taxa mash includes the number of matching hashes of the assembled genome to the accession ID/Taxa name. Higher the matching hash- more likely the genome is related to the taxa predicted
  • Gene searches:
    • Whether integrase is found (search for integrase gene in annotations)
    • Whether anti-microbial genes were found (Phold and Pharokka search against AMR database)
    • Whether any virulence factors were found (Pharokka search against virulence gene database)
    • Whether any CRISPR spacers were found (Pharokka search against MinCED database)

Output for sphae annotate is saved to example/final-annotate directory. In this directory there will be;

  • Genome annotations in GenBank format (Phynteny output)
  • Genome in fasta format (either the reoriented to terminase output from Pharokka, or assembled viral contigs)
  • Circular visualization in png format (Pharokka output)
  • Genome summary file
  • trees folder; Not this folder might be meaningful only if you have tailed phages
  • all_portal.nwk: Tree using all proteins annotated as "portal protein:
  • all_terL.nwk: Tree using all proteins annotated as "terminase large subunit"

Genome summary file includes the following information to help,

  • Sample name
  • Taxa mash includes the number of matching hashes of the assembled genome to the acc
View on GitHub
GitHub Stars41
CategoryDevelopment
Updated29d ago
Forks9

Languages

Python

Security Score

90/100

Audited on Feb 26, 2026

No findings