SkillAgentSearch skills...

Unicycler

hybrid assembly pipeline for bacterial genomes

Install / Use

/learn @rrwick/Unicycler
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

<p align="center"><img src="misc/logo.png" alt="Unicycler" width="600"></p>

Unicycler is an assembly pipeline for bacterial genomes. It can assemble Illumina-only read sets where it functions as a SPAdes-optimiser. It can also assembly long-read-only sets (PacBio or Nanopore) where it runs a miniasm+Racon pipeline. For the best possible assemblies, give it both Illumina reads and long reads, and it will conduct a short-read-first hybrid assembly.

Read more about Unicycler here:

Wick RR, Judd LM, Gorrie CL, Holt KE. Unicycler: resolving bacterial genome assemblies from short and long sequencing reads. PLoS Comput Biol 2017.

And read about how we use it to complete bacterial genomes here:

Wick RR, Judd LM, Gorrie CL, Holt KE. Completing bacterial genome assemblies with multiplex MinION sequencing. Microb Genom 2017.

Table of contents

2022 update

Unicycler was initially made in 2016, back when long reads could be sparse and very noisy. For example, our early Oxford Nanopore sequencing runs might generate only 15× read depth for a single bacterial isolate, and most of the reads had a lot of errors. So Unicycler was designed to use low-depth and low-accuracy long reads to scaffold a short-read assembly graph to completion, an approach I call short-read-first hybrid assembly. Assuming the short-read assembly graph is in good shape, Unicycler does this quite well!

However, things have changed in the last six years. Nanopore sequencing yield is now much higher, making >100× depth easy to obtain, even on multiplexed runs. Read accuracy has also improved and continues to get better each year. High-depth and high-accuracy long reads make long-read-first hybrid assembly (long-read assembly followed by short-read polishing) a viable approach that's often preferable to Unicycler. I have developed Trycycler and Polypolish in the pursuit of ideal long-read-first assemblies.

Unicycler is not completely out-of-date, as it is still (in my opinion) the best tool for short-read-first hybrid assembly of bacterial genomes. But I think it should only be used for hybrid assembly when long-read-first is not an option – i.e. when long-read depth is low. I also think that Unicycler is good for short-read-only bacterial genomes, as it produces cleaner assembly graphs than SPAdes alone. So while Unicycler doesn't get a lot of my time and attention these days, I don't yet consider it to be abandonware.

For some up-to-date bacterial genome assembly tips, check out these parts of Trycycler's wiki:

Introduction

As input, Unicycler takes one of the following:

  • Illumina reads from a bacterial isolate (ideally paired-end, but unpaired works too)
  • A set of long reads (either PacBio or Nanopore) from a bacterial isolate
  • Illumina reads and long reads from the same isolate (best case)

Reasons to use Unicycler:

  • It circularises replicons without the need for a separate tool like Circlator.
  • It handles plasmid-rich genomes.
  • It can use long reads of any depth and quality in hybrid assembly. 20× or more may be required to complete a genome, but Unicycler can make nearly-complete genomes with far fewer long reads.
  • It produces an assembly graph in addition to a contigs FASTA file, viewable in Bandage.
  • It filters out low-depth contigs, giving clean assemblies even when the read set has low-level contamination.
  • It has low misassembly rates.
  • It can cope with highly repetitive genomes, such as Shigella.
  • It's easy to use: runs with just one command and usually doesn't require tinkering with parameters.

Reasons to not use Unicycler:

  • You're assembling a eukaryotic genome or a metagenome (Unicycler is designed exclusively for bacterial isolates).
  • Your Illumina reads and long reads are from different isolates (Unicycler struggles with sample heterogeneity).
  • You're impatient (Unicycler is thorough but not especially fast).

Requirements

  • Linux or macOS
  • Python 3.4 or later
  • C++ compiler with C++14 support:
    • GCC 4.9.1 or later
    • Clang 3.5 or later
    • ICC also works (though I don't know the minimum required version number)
  • setuptools (only required for installation of Unicycler)
  • For short-read or hybrid assembly:
    • SPAdes v3.14.0 or later (spades.py)
  • For long-read or hybrid assembly:
  • For rotating circular contigs:
    • BLAST+ (makeblastdb and tblastn)

Unicycler expects external tools to be available in $PATH. If they aren't, you can specify their location using Unicycler options (e.g. --spades_path).

Bandage isn't required to run Unicycler, but it is very helpful for manually investigating assemblies (the graph images in this README were made with Bandage).

Installation

Install from source

These instructions install the most up-to-date version of Unicycler:

git clone https://github.com/rrwick/Unicycler.git
cd Unicycler
python3 setup.py install

Notes:

  • If the last command complains about permissions, you may need to run it with sudo.
  • If you want a particular version of Unicycler, download the source from the releases page instead of cloning from GitHub.
  • Install just for your user: python3 setup.py install --user
    • If you get a strange 'can't combine user with prefix' error, read this.
  • Install to a specific location: python3 setup.py install --prefix=$HOME/.local
  • Install with pip (local copy): pip3 install path/to/Unicycler
  • Install with pip (from GitHub): pip3 install git+https://github.com/rrwick/Unicycler.git
  • Install with specific Makefile options: python3 setup.py install --makeargs "CXX=icpc"

Build and run without installation

This approach compiles Unicycler code, but doesn't copy executables anywhere:

git clone https://github.com/rrwick/Unicycler.git
cd Unicycler
make

Now instead of running unicycler, you instead use path/to/unicycler-runner.py.

Quick usage

Illumina-only assembly:<br> unicycler -1 short_reads_1.fastq.gz -2 short_reads_2.fastq.gz -o output_dir

Long-read-only assembly:<br> unicycler -l long_reads.fastq.gz -o output_dir

Hybrid assembly:<br> unicycler -1 short_reads_1.fastq.gz -2 short_reads_2.fastq.gz -l long_reads.fastq.gz -o output_dir

If you don't have any reads of your own, take a look in the sample_data directory for links to some small read sets.

Background

Assembly graphs

To understand what Unicycler is doing, you need to know about assembly graphs. For a thorough introduction, I'd suggest this tutorial or the [Velvet paper](http://genome.cshlp.org/content/genome/

View on GitHub
GitHub Stars635
CategoryDevelopment
Updated13d ago
Forks138

Languages

C++

Security Score

95/100

Audited on Mar 19, 2026

No findings