Unicycler
hybrid assembly pipeline for bacterial genomes
Install / Use
/learn @rrwick/UnicyclerREADME
Unicycler is an assembly pipeline for bacterial genomes. It can assemble Illumina-only read sets where it functions as a SPAdes-optimiser. It can also assembly long-read-only sets (PacBio or Nanopore) where it runs a miniasm+Racon pipeline. For the best possible assemblies, give it both Illumina reads and long reads, and it will conduct a short-read-first hybrid assembly.
Read more about Unicycler here:
And read about how we use it to complete bacterial genomes here:
Table of contents
- 2022 update
- Introduction
- Requirements
- Installation
- Quick usage
- Background
- Method: Illumina-only assembly
- Method: long-read-only assembly
- Method: hybrid assembly
- Conservative, normal and bold
- Options and usage
- Output files
- Tips
- Acknowledgements
- License
2022 update
Unicycler was initially made in 2016, back when long reads could be sparse and very noisy. For example, our early Oxford Nanopore sequencing runs might generate only 15× read depth for a single bacterial isolate, and most of the reads had a lot of errors. So Unicycler was designed to use low-depth and low-accuracy long reads to scaffold a short-read assembly graph to completion, an approach I call short-read-first hybrid assembly. Assuming the short-read assembly graph is in good shape, Unicycler does this quite well!
However, things have changed in the last six years. Nanopore sequencing yield is now much higher, making >100× depth easy to obtain, even on multiplexed runs. Read accuracy has also improved and continues to get better each year. High-depth and high-accuracy long reads make long-read-first hybrid assembly (long-read assembly followed by short-read polishing) a viable approach that's often preferable to Unicycler. I have developed Trycycler and Polypolish in the pursuit of ideal long-read-first assemblies.
Unicycler is not completely out-of-date, as it is still (in my opinion) the best tool for short-read-first hybrid assembly of bacterial genomes. But I think it should only be used for hybrid assembly when long-read-first is not an option – i.e. when long-read depth is low. I also think that Unicycler is good for short-read-only bacterial genomes, as it produces cleaner assembly graphs than SPAdes alone. So while Unicycler doesn't get a lot of my time and attention these days, I don't yet consider it to be abandonware.
For some up-to-date bacterial genome assembly tips, check out these parts of Trycycler's wiki:
- Should I use Unicycler or Trycycler to assemble my bacterial genome?
- Guide to bacterial genome assembly
Introduction
As input, Unicycler takes one of the following:
- Illumina reads from a bacterial isolate (ideally paired-end, but unpaired works too)
- A set of long reads (either PacBio or Nanopore) from a bacterial isolate
- Illumina reads and long reads from the same isolate (best case)
Reasons to use Unicycler:
- It circularises replicons without the need for a separate tool like Circlator.
- It handles plasmid-rich genomes.
- It can use long reads of any depth and quality in hybrid assembly. 20× or more may be required to complete a genome, but Unicycler can make nearly-complete genomes with far fewer long reads.
- It produces an assembly graph in addition to a contigs FASTA file, viewable in Bandage.
- It filters out low-depth contigs, giving clean assemblies even when the read set has low-level contamination.
- It has low misassembly rates.
- It can cope with highly repetitive genomes, such as Shigella.
- It's easy to use: runs with just one command and usually doesn't require tinkering with parameters.
Reasons to not use Unicycler:
- You're assembling a eukaryotic genome or a metagenome (Unicycler is designed exclusively for bacterial isolates).
- Your Illumina reads and long reads are from different isolates (Unicycler struggles with sample heterogeneity).
- You're impatient (Unicycler is thorough but not especially fast).
Requirements
- Linux or macOS
- Python 3.4 or later
- C++ compiler with C++14 support:
- setuptools (only required for installation of Unicycler)
- For short-read or hybrid assembly:
- SPAdes v3.14.0 or later (
spades.py)
- SPAdes v3.14.0 or later (
- For long-read or hybrid assembly:
- Racon (
racon)
- Racon (
- For rotating circular contigs:
- BLAST+ (
makeblastdbandtblastn)
- BLAST+ (
Unicycler expects external tools to be available in $PATH. If they aren't, you can specify their location using Unicycler options (e.g. --spades_path).
Bandage isn't required to run Unicycler, but it is very helpful for manually investigating assemblies (the graph images in this README were made with Bandage).
Installation
Install from source
These instructions install the most up-to-date version of Unicycler:
git clone https://github.com/rrwick/Unicycler.git
cd Unicycler
python3 setup.py install
Notes:
- If the last command complains about permissions, you may need to run it with
sudo. - If you want a particular version of Unicycler, download the source from the releases page instead of cloning from GitHub.
- Install just for your user:
python3 setup.py install --user- If you get a strange 'can't combine user with prefix' error, read this.
- Install to a specific location:
python3 setup.py install --prefix=$HOME/.local - Install with pip (local copy):
pip3 install path/to/Unicycler - Install with pip (from GitHub):
pip3 install git+https://github.com/rrwick/Unicycler.git - Install with specific Makefile options:
python3 setup.py install --makeargs "CXX=icpc"
Build and run without installation
This approach compiles Unicycler code, but doesn't copy executables anywhere:
git clone https://github.com/rrwick/Unicycler.git
cd Unicycler
make
Now instead of running unicycler, you instead use path/to/unicycler-runner.py.
Quick usage
Illumina-only assembly:<br>
unicycler -1 short_reads_1.fastq.gz -2 short_reads_2.fastq.gz -o output_dir
Long-read-only assembly:<br>
unicycler -l long_reads.fastq.gz -o output_dir
Hybrid assembly:<br>
unicycler -1 short_reads_1.fastq.gz -2 short_reads_2.fastq.gz -l long_reads.fastq.gz -o output_dir
If you don't have any reads of your own, take a look in the sample_data directory for links to some small read sets.
Background
Assembly graphs
To understand what Unicycler is doing, you need to know about assembly graphs. For a thorough introduction, I'd suggest this tutorial or the [Velvet paper](http://genome.cshlp.org/content/genome/
