Dragonflye
:dragon: :fly: Assemble bacterial isolate genomes from Nanopore reads
Install / Use
/learn @rpetit3/DragonflyeREADME
NOTE: This is under active development, any feedback will be very useful
dragonflye
:dragon: :fly: Assemble bacterial isolate genomes from Nanopore reads
A Quick Note
If you've worked with bacterial sequences, in all likelihood you have used one of Torsten Seemann's tools. One such tool is Shovill, which takes the bacterial genome assembly process and makes it quick and painless. Shovill was developed for paired-end Illumina reads, and there is a fork, shovill-se, which supports single-end reads.
Given the widespread usage of Shovill, and Torsten basically laying much of the groundwork, I decided to use Shovill as a framework for Dragonflye. Dragonflye can be considered a fork of Shovill that supports assembling Oxford Nanopore sequences. By going this route users will not have to relearn parameters, and will already be familiar with the outputs.
At this point, you might be wondering: so Robert you just hacked Shovill to work with ONT reads, why not just call it 'shovill-ont'?
That's because when I asked if there was interest in a "Shovill" for ONT reads, Curtis Kapsak (@kapsakcj) responded:
Curtis Kapsak (@kapsakcj): if wrapping
flye, perhaps call itdragonflye(a very fast flye)?.
And, honestly how could I not go with that?!? It's an amazing play-on-words that I'm willing to bet Torsten would be proud of it!
So to sum it up, thank you Torsten for Shovill and providing a framework for Dragonflye.
Introduction
Dragonflye is a pipeline that aims to make assembling Oxford Nanopore reads quick and easy. Still working on the quick part, but I think the easy part is there. Dragonflye currently supports Flye, Miniasm and Raven assemblers, and Racon and Medaka polishers.
Main Steps
- Estimate genome size and read length from reads (unless
--gsizeprovided) (kmc) - Filter reads by length (default
--minreadlength 1000) (Nanoq) - Reduce FASTQ files to a sensible depth (default
--depth 150) (rasusa) - Remove adapters (requires
--trimbe given) (Porechop) - Assemble with Flye, Miniasm, or Raven
- Polish assembly with Racon and/or Medaka
- Polish assembly with short reads via Polypolish and/or Pilon
- Remove contigs that are too short, too low coverage, or pure homopolymers
- Produce final FASTA with nicer names and parsable annotations
- Reorient contigs from final FASTA using dnaapler
- Output parsable assembly statistics (assembly-scan)
Quick Start
dragonflye --reads my-ont.fastq.gz --outdir dragonflye --gsize 5000000
... LOG TEXT ...
[dragonflye] Final assembly contigs: /home/robert_petit/repos/dragonflye/temp/dragonflye/contigs.fa
[dragonflye] It contains 3 (min=4864) contigs totalling 4939840 bp.
[dragonflye] Dragonfly fossils have been found with wingspans up to two feet (61cm)!
[dragonflye] Done.
ls dragonflye/
contigs.fa contigs.gfa dragonflye.log flye-info.txt flye.fasta
head -n4 dragonfly/contigs.fa
>contig00001 len=2753792 origname=Utg1024_LN:i:2753792_RC:i:486_XO:i:0 polish=none sw=dragonflye-raven/1.2.0 date=20231031
TTCTATTTATCAGTATCATTACTTTTATATTATCGATAATTAATCCGAACATATCATTAA
TCAAGTTATTATTCGAAGTGGTTTTGCTGCATTTGGAACAGTCGGGTTAAGTATGAACCT
TACCACAGAAGATAATAATGGTATTACTAAAATAATTATTATATTCGTTATGCTTTGCGG
head -n4 dragonfly/contigs.reoriented.fa
>contig00001 len=2753792 origname=Utg1024_LN:i:2753792_RC:i:486_XO:i:0 polish=none sw=dragonflye-raven/1.2.0 date=20231031 rotated=True
ATGTCGGAAAAAGAAATTTGGGAAAAGTGCTTGAAATTGCTCAAGAAAAATTATCAGCTG
TAAGTTACTCAACTTTCCTAAAAGATGACGAGGCTTTACACGATTAAAGATGGTGAAGCT
ATCGTATTATCGAGTATTCCTTTTAATGCAAATTGGTTAAATCAACAATATGCTGAAATT
Installation
Dragonflye is available from Bioconda. Dragonflye includes a lot of programs, so it can take conda a
while to solve the environment. Because of this, I personally use Mamba
to install it, because it's so much faster.
# With conda
conda create -n dragonflye -c conda-forge -c bioconda dragonflye
# With Mamba (much quicker)
mamba create -n dragonflye -c conda-forge -c bioconda dragonflye
Usage
Dragonflye - A very fast flye
SYNOPSIS
De novo assembly pipeline for bacterial isolates with Nanopore reads
USAGE
dragonflye [options] --outdir DIR --reads READS.fastq.gz
GENERAL
--help This help
--version Print version and exit
--check Check dependencies are installed
--seed N Random seed to use (default: 42)
INPUT
--reads XXX Input Nanopore FASTQ (default: '')
--depth N Sub-sample --reads to this depth. Disable with --depth 0 (default: 150)
--minreadlen N Minimum read length. Disable with --minreadlength 0 (default: 1000)
--gsize XXX Estimated genome size eg. 3.2M <blank=AUTODETECT> (default: '')
OUTPUT
--outdir XXX Output folder (default: '')
--prefix XXX Prefix to use for final assembly FASTA (default: 'contigs')
--force Force overwite of existing output folder (default: OFF)
--minlen N Minimum contig length <0=AUTO> (default: 500)
--mincov n.nn Minimum contig coverage <0=AUTO> (default: 2)
--namefmt XXX Format of contig FASTA IDs in 'printf' style (default: 'contig%05d')
--keepfiles Keep intermediate files (default: OFF)
RESOURCES
--tmpdir XXX Fast temporary directory (default: '')
--cpus N Number of CPUs to use (0=ALL) (default: 8)
--ram n.nn Try to keep RAM usage below this many GB (default: 16)
ASSEMBLER
--assembler XXX Assembler: raven miniasm flye (default: 'flye')
--opts XXX Extra assembler options in quotes eg. flye: '--interations' (default: '')
--nanohq For Flye, use '--nano-hq' instead of --nano-raw (default: OFF)
POLISHER
--racon N Number of polishing rounds to conduct with Racon (default: 1)
--medaka N Number of polishing rounds to conduct with Medaka (requires --model) (default: 0)
--model XXX The model to be used by Medaka, (Assumes 1 polishing round, if --medaka not used) (default: '')
--list_models List the models available to Medaka (default: OFF)
SHORT-READ POLISHER
--polypolish N Number of polishing rounds to conduct with Polypolish (requires --R1 and --R2) (default: 1)
--polypolish_careful Polypolish will ignore any reads with multiple alignments (default: OFF)
--pilon N Number of polishing rounds to conduct with Pilon (requires --R1 and --R2) (default: 0)
--R1 XXX Read 1 FASTQ to use for polishing (default: '')
--R2 XXX Read 2 FASTQ to use for polishing (default: '')
REORIENT
--noreorient Disable contig reorientation using dnaapler (default: OFF)
--dnaapler_mode XXX The mode of reorientation to execute (default: 'all')
--dnaapler_opts XXX Extra dnaapler options in quotes eg. '--evalue 1e-5' (default: '')
MODULES
--trim Enable adaptor trimming (default: OFF)
--trimopts XXX Extra porechop options in quotes eg. '--adapter_threshold 80' (default: '')
--nofilter Disable read length filtering (default: OFF)
--nopolish Disable assembly polishing (default: OFF)
HOMEPAGE
https://github.com/rpetit3/dragonflye - Robert A Petit III
--depth
Giving an assembler too much data is a bad thing. There comes a point where you are no longer adding new information (as the genome is a fixed size), and only adding more noise (sequencing errors). Because of this Dragonflye will downsample your FASTQ files to a specific depth (defaults to 150x). It estimates depth by dividing read yield by genome size.
--gsize
The genome size is needed to estimate depth and for the assembly stage. If you don't provide --gsize,
it will be estimated via k-mer frequencies using kmc. It doesn't need to be a perfect estimate,
just in the right ballpark. If you know the genome size it is usually better then the estimate,
and will save some time.
--keepfiles
This will keep all the intermediate files in --outdir so you can explore and debug.
--cpus
By default it will attempt to use all available CPU cores.
--ram
Dragonflye will do its best to keep memory usage below this value, but it is not guaranteed. If you are on a HPC cluster, you should make sure you tell your job submission engine a value higher than this.
--assembler
By default it will use FlyeA.
--opts
If you want to provide some assembler-specific parameters you can use the --opts
parameter. Make sure you quote the parameters so they get passed as a single string
eg. For --assembler flye you might use --opts "--iterations 4 --plasmids".
--racon & --medaka
These two parameters adjust how
Related Skills
node-connect
344.4kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
99.2kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
344.4kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
344.4kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
