SkillAgentSearch skills...

Dnaapler

Reorients assembled microbial sequences

Install / Use

/learn @gbouras13/Dnaapler
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

Open In Colab

DOI

CI codecov Code style: black DOI

Anaconda-Server Badge Bioconda Downloads PyPI version Downloads

dnaapler

Dnaapler is a simple tool that reorients complete circular microbial genomes.

Quick Start

# creates empty conda environment
conda create -n dnaapler_env

# activates conda environment
conda activate dnaapler_env

# installs dnaapler
conda install -c bioconda dnaapler

# runs dnaapler all 
dnaapler all -i input_mixed_contigs.fasta -o output_directory_path -p my_bacteria_name -t 8

# runs dnaapler all with a gfa file from e.g. Flye, Unicycler or Autocycler
dnaapler all -i assembly.gfa -o output_directory_path -p my_bacteria_name -t 8
  • If you have a MacOS machine with Apple Silicon (M1/M2/M3/M4) and are having installation issues, please try
conda create --platform osx-64 -n dnaapler_env dnaapler

conda activate dnaapler_env

dnaapler all -i input_mixed_contigs.fasta -o output_directory_path -p my_bacteria_name -t 8

Paper

Dnaapler has been published in JOSS here. If you use Dnaapler in your work, please cite it as follows:


George Bouras, Susanna R. Grigson, Bhavya Papudeshi, Vijini Mallawaarachchi, Michael J. Roach (2024). Dnaapler: A tool to reorient circular microbial genomes. Journal of Open Source Software, 9(93), 5968, https://doi.org/10.21105/joss.05968

Additionally, please consider citing the dependencies where relevant:

Altschul S.F., Gish W., Miller W., Myers E.W., Lipman D.J. Basic local alignment search tool. J Mol Biol. 1990 Oct 5;215(3):403-10. doi: 10.1016/S0022-2836(05)80360-2. PMID: 2231712.

Steinegger M, Söding J. MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets. Nat Biotechnol. 2017 Nov;35(11):1026-1028. doi: 10.1038/nbt.3988.

Larralde, M., (2022). Pyrodigal: Python bindings and interface to Prodigal, an efficient method for gene prediction in prokaryotes. Journal of Open Source Software, 7(72), 4296, https://doi.org/10.21105/joss.04296.

Hyatt, D., Chen, GL., LoCascio, P.F. et al. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics 11, 119 (2010). https://doi.org/10.1186/1471-2105-11-119.

v1 and other recent changes

1.3.0

  • Thanks @mbhall88 for extending the functionality of --ignore

  • If your input FASTA or GFA is mixed (e.g. has chromosome and plasmids), you can also use dnaapler all, with the option to ignore some contigs with the --ignore parameter. The --ignore parameter accepts either:

    1. A file path containing contig names to ignore (one per line)
    2. A comma-separated list of contig names (e.g., chr1,chr2,chr3)
      • to read contig names from stdin (one per line)

1.2.0

  • Thanks to the one and only @rrwick, Dnaapler now supports the GFA format as input. This was done to ensure support for Ryan's new bacterial genome assembly tool Autocycler, the successor to Trycycler, but may also be useful if you have GFA files from e.g. Unicycler, Flye, Spades or other assemblers.
    • If you run dnaapler with GFA input, you will get a GFA output as well.
    • If you run dnaapler with GFA input, only circular contigs will be reoriented
  • Relaxes the MMSeqs2 dependency to >=13.45111

1.1.0

  • Adds support for reorienting contigs where the gene of interest spands the contig ends - fixes this issue. Thanks @marade @oschwengers.
    • Specifically, this is done by rotating each contig in the input by half the genome length, then running MMseqs2 for both the original and rotated contigs. The MMseqs2 hit with the highest bitscore across the original and rotated contigs will be chosen as the top hit to rotate by, therefor enabling detection of partial hits (on the original contig) that span the contig ends.
  • This has only been implemented for dnaapler all (this should be the command used by 99% of users).

v1.0

  • BREAKING CHANGE - dnaapler now uses MMSeqs2 v13.45111 rather than BLAST. You will need to install MMSeqs2 if you upgrade (if you use conda, it should be handled for you). The CLI is identical.
  • There are 2 reasons for this:
    1. Users reported problems installing BLAST on MacOS with Apple Silicon (see e.g. here). MMseqs2 works on all platforms and is dilligently maintained.
    2. MMSeqs2 is much much faster than BLAST (what took BLAST a few minutes takes MMSeqs2 seconds). We probably should have written dnaapler with MMseqs2 to begin with. MMSeqs2 v13.45111 was chosen to ensure interoperability with pharokka
  • The alignment resuls may not be identicial to dnaapler v0.8.1 (i.e. they might find different top hits), but the actual reorientation is likely to be identical (at least in my tests). Please reach out or make an issue if you notice any discrepancies

For example - on my machine (Ubuntu 20.04, Intel i9 13th gen 13900 CPU with 32 threads), for a Staphylococcus aureus genome with 1 small plasmid, dnaapler -i staph.fasta -o staph_dnaapler -t 8 took ~129 seconds wallclock with v0.8.1 using BLAST, while it took ~3 seconds wallclock with v1.0.0 using MMseqs2.

Google Colab Notebooks

If you don't want to install dnaapler locally, you can run dnaapler all without any code using the Google Colab notebook.

Table of Contents

Description

<p align="center"> <img src="paper/Dnaapler_figure.png" alt="Dnaapler Figure"> </p>

dnaapler is a simple python program that takes a single nucleotide input sequence (in FASTA or GFA format), finds the desired start gene using MMseqs2 against an amino acid sequence database, checks that the start codon of this gene is found, and if so, then reorients the chromosome to begin with this gene on the forward strand.

It was originally designed to replicate the reorientation functionality of Unicycler with dnaA, but for for long-read first assembled chromosomes. We have extended it to work with plasmids (dnaapler plasmid) and phages (dnaapler phage), or for any input FASTA or GFA desired with dnaapler custom, dnaapler mystery or dnaapler nearest.

For bacterial chromosomes, dnaapler chromosome should ensure the chromosome breakpoint never interrupts genes or mobile genetic elements like prophages. It is intended to be used with good-quality completed bacterial genomes, generated with methods such as Autocycler, Dragonflye or my own pipeline hybracter.

Additionally, you can also reorient multiple bacterial chromosomes/plasmids/phages at once using the dnaapler bulk subcommand.

If your input FASTA or GFA is mixed (e.g. has chromosome and plasmids), you can also use dnaapler all, with the option to ignore some contigs with the --ignore parameter. The --ignore parameter accepts either:

  • A file path containing contig names to ignore (one per line)
  • A comma-separated list of contig names (e.g., chr1,chr2,chr3)
  • - to read contig names from stdin (one per line)

As of v1, in practice, dnaapler all is the only command you will likely need, as it contains all the functionality of bulk, chromosome, plasmid, phage but with much more flexibility and user-friendliness

When provided with a GFA file, dnaapler will process only circular sequences – those with a single circularising link and no additional links – while leaving all other sequences unchanged. The output format will match the input: FASTA input produces FASTA output, and GFA input produces GFA output.

Documentation

The full documentation for dnaapler can

View on GitHub
GitHub Stars136
CategoryDevelopment
Updated15d ago
Forks5

Languages

Python

Security Score

95/100

Audited on Mar 19, 2026

No findings