Hybracter
Automated long-read first bacterial genome assembly tool implemented in Snakemake using Snaketool.
Install / Use
/learn @gbouras13/HybracterREADME
Hybracter: Enabling Scalable, Automated, Complete and Accurate Bacterial Genome Assemblies
hybracter is an automated long-read first bacterial genome assembly tool implemented in Snakemake using Snaketool.
Table of Contents
- Hybracter: Enabling Scalable, Automated, Complete and Accurate Bacterial Genome Assemblies
- Citation
Quick Start
Conda
hybracter is available to install with pip or conda.
You will need conda or mamba available so hybracter can install all the required dependencies.
Therefore, it is recommended to install hybracter into a conda environment as follows.
conda create -n hybracterENV -c bioconda -c conda-forge hybracter
conda activate hybracterENV
hybracter --help
hybracter install
Miniforge is highly highly recommended. Please see the documentation for more details on how to install Miniforge.
When you run hybracter for the first time, all the required dependencies will be installed as required, so it will take longer than usual (usually a few minutes). Every time you run it afterwards, it will be a lot faster as the dependencies will be installed.
If you intend to run hybracter offline (e.g. on HPC nodes with no access to the internet), I highly recommend running hybracter test-hybrid and/or hybracter test-long on a node with internet access so hybracter can download the required dependencies. It should take 5-10 minutes. If your computer/node has internet access, please skip this step.
hybracter test-hybrid --threads 8
hybracter test-long --threads 8
- Note: if you are installing Hybracter on a mac, please use
--mac- this will install Medaka v1.8 (not v2, which is not available for MacOS). Alternatively, if you want Medaka v2, you should try the container option with Docker.
Container
Alternatively, a Docker/Singularity Linux container image is available for Hybracter (starting from v0.7.1) here. This will likely be useful for running Hybracter in HPC environments.
- Note the container image comes with the database and all environments installed - there is no need to run
hybracter installorhybracter test-hybrid/hybracter test-longor to specify a database directory with-d.
To install and run v0.11.0 with singularity
IMAGE_DIR="<the directory you want the .sif file to be in >"
singularity pull --dir $IMAGE_DIR docker://quay.io/gbouras13/hybracter:0.11.0
containerImage="$IMAGE_DIR/hybracter_0.11.0.sif"
# example command with test fastqs
singularity exec $containerImage hybracter hybrid-single -l test_data/Fastqs/test_long_reads.fastq.gz \
-1 test_data/Fastqs/test_short_reads_R1.fastq.gz -2 test_data/Fastqs/test_short_reads_R2.fastq.gz \
-o output_test_singularity -t 4 --auto
To install and run v0.11.0 with Docker (recommended if you have a Mac as it has Medaka v2)
docker pull quay.io/gbouras13/hybracter:0.11.0
docker run quay.io/gbouras13/hybracter:0.11.0 hybracter -h
# -v mounts directories from your local filesystem to the docker contaier
docker run --rm -v /path/to/my/test/fastqs:/data -v /path/to/where/i/want/the/output:/output quay.io/gbouras13/hybracter:0.11.0 hybracter hybrid-single \
-l /data/test_long_reads.fastq.gz \
-1 /data/test_short_reads_R1.fastq.gz \
-2 /data/test_short_reads_R2.fastq.gz \
-o /output/output_test_docker -t 4 –auto
Google Colab Notebooks
If you don't want to install hybracter locally, you can run it without any code using the colab notebook https://colab.research.google.com/github/gbouras13/hybracter/blob/main/run_hybracter.ipynb
This is only recommended if you have one or a few samples to assemble (it takes a while per sample due to the limited nature of Google Colab resources - probably an hour or two a sample). If you have more than this, a local install as described below is suggested.
Documentation
Documentation for hybracter is available here.
Manuscript
hybracter has recently been published in Microbial Genomics
- George Bouras, Ghais Houtak, Ryan R Wick, Vijini Mallawaarachchi, Michael J. Roach, Bhavya Papudeshi, Louise M Judd, Anna E Sheppard, Robert A Edwards, Sarah Vreugde - Hybracter: Enabling Scalable, Automated, Complete and Accurate Bacterial Genome Assemblies. (2024) Microbial Genomics doi: https://doi.org/10.1099/mgen.0.001244.
Description
hybracter is designed for assembling bacterial isolate genomes using a long read first assembly approach.
It scales massively using the embarrassingly parallel power of HPC and Snakemake profiles. It is designed for applications where you have isolates with Oxford Nanopore Technologies (ONT) long reads and optionally matched paired-end short reads for polishing.
hybracter is designed to straddle the fine line between being as fully feature-rich as possible with as much information as you need to decide upon the best assembly, while also being a one-line automated program. In other words, as awesome as Unicycler, but updated for 2023. Perfect for lazy people like myself.
hybracter is largely based off Ryan Wick's magnificent tutorial and associated paper. hybracter differs in that it adds some additional steps regarding targeted plasmid assembly with plassembler, contig reorientation with dnaapler and extra polishing and statistical summaries.
Note: if you have Pacbio reads, as of 2023, you can run hybracter long with --no_medaka to turn off polishing, and --flyeModel pacbio-hifi. You can also probably just run Flye or Dragonflye (or of course Trycyler ) and reorient the contigs with dnaapler without polishing. See Ryan Wick's blogpost for more details.
Pipeline
<p align="center"> <img src="img/hybracter.png" alt="Hybracter" height=600> </p>- A. Reads are quality controlled with Filtlong, [Porechop](https://github.com/rrwick/P
