Dorado

Oxford Nanopore's Basecaller

Generate Convert Improve

Install / Use

/learn @nanoporetech/Dorado

About this skill

Quality Score

0/100

README

Dorado

Dorado is a high-performance, easy-to-use, open source analysis engine for Oxford Nanopore reads.

Detailed information about Dorado and its features is available in the Dorado Documentation.

Features

One executable with sensible defaults, automatic hardware detection and configuration.
Runs on Apple silicon (M series) and Nvidia GPUs including multi-GPU with linear scaling (see Platforms).
Modified basecalling.
Duplex basecalling (watch the following video for an introduction to Duplex).
Simplex barcode classification.
Support for aligned read output in SAM/BAM.
Initial support for poly(A) tail estimation.
Support for single-read error correction.
POD5 support for highest basecalling performance (documentation).
Based on libtorch, the C++ API for pytorch.
Multiple custom optimisations in CUDA and Metal for maximising inference performance.

If you encounter any problems building or running Dorado, please report an issue.

Installation

First, download the relevant installer for your platform:

Once the relevant .tar.gz or .zip archive is downloaded, extract the archive to your desired location.

You can then call Dorado using the full path, for example:

/path/to/dorado-x.y.z-linux-x64/bin/dorado basecaller hac pod5s/ > calls.bam

Or you can add the bin path to your $PATH environment variable, and run with the dorado command instead, for example:

dorado basecaller hac pod5s/ > calls.bam

Please visit the dorado documentation for more information on getting started.

See DEV.md for details about building Dorado for development.

Platforms

Dorado is heavily-optimised for Nvidia A100 and H100 GPUs and will deliver maximal performance on systems with these GPUs.

Dorado has been tested extensively and supported on the following systems:

| Platform | GPU/CPU | Minimum Software Requirements | | --- |---------|--------------| | Linux x86_64 | (G)V100, A100, H100 | CUDA Driver ≥525.105 | | Linux arm64 | Jetson Orin, Jetson Thor, DGX Spark* | Linux for Tegra ≥36.4.3 (JetPack ≥6.2) | | Windows x86_64 | (G)V100, A100, H100 | CUDA Driver ≥529.19 | | Apple | Apple Silicon (M series) | macOS ≥14 |

*DGX Spark supports all Dorado commands except Dorado correct. Support for Dorado correct will be added in a future release.

Linux x64 or Windows systems not listed above but which have Nvidia GPUs with ≥8 GB VRAM and architecture from Pascal onwards (except P100/GP100) have not been widely tested but are expected to work. When basecalling with Apple devices, we recommend systems with ≥16 GB of unified memory.

If you encounter problems with running on your system, please report an issue.

AWS Benchmarks on Nvidia GPUs for Dorado 0.3.0 are available here. Please note: Dorado's basecalling speed is continuously improving, so these benchmarks may not reflect performance with the latest release.

Performance tips

Dorado will automatically detect your GPU's free memory and select an appropriate batch size.
Dorado will automatically run in multi-GPU cuda:all mode. If you have a heterogeneous collection of GPUs, select the faster GPUs using the --device flag (e.g., --device cuda:0,2). Not doing this will have a detrimental impact on performance.
On Windows systems with Nvidia GPUs, open Nvidia Control Panel, navigate into “Manage 3D settings” and then set “CUDA - Sysmem Fallback Policy” to “Prefer No Sysmem Fallback”. This will provide a significant performance improvement.

Running

The following are helpful commands for getting started with Dorado. To see all options and their defaults, run dorado -h and dorado <subcommand> -h.

Simplex basecalling

To run Dorado basecalling, using the automatically downloaded hac model on a directory of POD5 files or a single POD5 file.

dorado basecaller hac pod5s/ > calls.bam

To basecall a single file, simply replace the directory pod5s/ with a path to your data file.

Click here for more details on simplex basecalling including how to use the --resume-from feature.

DNA adapter and primer trimming

Dorado can detect and remove any adapter and/or primer sequences from the beginning and end of DNA reads. Note that if you intend to demultiplex the reads at some later time, trimming primers will likely result in some portions of the flanking regions of the barcodes being removed, which could prevent demultiplexing from working properly. For details see the dorado documentation on read trimming.

Modified basecalling

Beyond the traditional A, T, C, and G basecalling, Dorado can also detect modified bases such as 5-methylcytosine (5mC), 5-hydroxymethylcytosine (5hmC), and N<sup>6</sup>-methyladenosine (6mA). These modified bases play crucial roles in epigenetic regulation.

For full details please read the documentation on modified basecalling.

To call modifications, extend the models argument with a comma-separated list of modifications:

dorado basecaller hac,5mCG_5hmCG,6mA pod5s/ > calls.bam

In the example above, basecalling is performed with the detection of both 5mC/5hmC in CG contexts and 6mA in all contexts. See here for details on modified basecalling context.

Refer to the models list table's Compatible Modifications column to see available modifications.

Modified basecalling is also supported with Duplex basecalling, where it produces hemi-methylation calls.

Duplex

To run Duplex basecalling, run the command:

dorado duplex sup pod5s/ > duplex.bam

For more details please head to the Dorado duplex basecalling documentation.

Alignment

Dorado supports aligning existing basecalls or producing aligned output directly, internally using minimap2.

To align existing basecalls, run:

dorado aligner <index> <reads>  > aligned.bam

where index is a reference to align to in (FASTQ/FASTA/.mmi) format and reads is a folder or file in any HTS format.

To basecall with alignment with duplex or simplex, run with the --reference option:

dorado basecaller <model> <reads> --reference <index> > calls.bam

For more details please check out the Dorado aligner documentation.

Sequencing Summary

The Dorado summary command outputs a tab-separated file with read level sequencing information from the BAM file generated during basecalling. To create a summary, run:

dorado summary <bam> > summary.tsv

Barcode Classification

Dorado supports barcode classification for existing basecalls as well as producing classified basecalls directly. Further details can be found at the Dorado barcoding documentation.

Poly(A) tail estimation

Dorado has initial support for estimating poly(A) tail lengths for cDNA (PCS and PCB kits) and RNA, and can be configured for use with custom primer sequences, interrupted tails, and plasmids. Note that Oxford Nanopore cDNA reads are sequenced in two different orientations and Dorado poly(A) tail length estimation handles both (A and T homopolymers). This feature can be enabled by passing --estimate-poly-a to the basecaller command. For more details check out the dorado poly(A) estimation documentation.

Read Error Correction

Dorado supports single-read error correction with the integration of the HERRO algorithm. HERRO uses all-vs-all alignment followed by haplotype-aware correction using a deep learning model to achieve higher single-read accuracies. The corrected reads are primarily useful for generating de novo assemblies of diploid organi

Related Skills

node-connect

341.8k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

84.6k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

341.8k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

commit-push-pr

84.6k

Commit, push, and open a PR