MpGAP
Multi-platform genome assembly pipeline for Illumina, Nanopore and PacBio reads
Install / Use
/learn @fmalmeida/MpGAPREADME
About
MpGAP is built using Nextflow, a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It uses Docker/Singularity containers making installation trivial and results highly reproducible. It is an easy to use pipeline that adopts well known software for de novo genome assembly of Illumina, Pacbio and Oxford Nanopore sequencing data through illumina only, long reads only or hybrid modes.
This pipeline wraps up the following software:
|| Source | |:- | :- | | Assemblers | Hifiasm, Canu, Flye, Raven, Shasta, wtdbg2, Haslr, Unicycler, Spades, Shovill, Megahit | | Polishers | Nanopolish, Medaka, gcpp, Polypolish and Pilon | | Quality check | Quast, BUSCO and MultiQC |
Release notes
Are you curious about changes between releases? See the changelog.
- I strongly, vividly, mightily recommend the usage of the latest versions hosted in master branch, which is nextflow's default.
- The latest will always have support, bug fixes and generally maitain the same processes (I mainly add things instead of removing) that also were in previous versions.
- But, if you really want to execute an earlier release, please see the instructions for that.
- Versions below 3.0 are no longer supported.
Further reading
This pipeline has two complementary pipelines (also written in nextflow) for NGS preprocessing and prokaryotic genome annotation that can give the user a complete workflow for bacterial genomics analyses.
Feedback
In the pipeline we always try to create a workflow and a execution dynamics that is the most generic possible and is suited for the most possible use cases.
Therefore, feedbacks are very well welcomed. If you believe that your use case is not encompassed in the pipeline, you have enhancement ideas or found a bug, please do not hesitate to open an issue to disscuss about it.
Installation
-
Install Nextflow:
curl -s https://get.nextflow.io | bash -
Give it a try:
nextflow run fmalmeida/mpgap --help -
Download required tools
-
for docker
# for docker docker pull fmalmeida/mpgap:v3.2 # run nextflow run fmalmeida/mpgap -profile docker [options] -
for singularity
# for singularity --> prepare env variables # remember to properly set NXF_SINGULARITY_LIBRARYDIR # read more at https://www.nextflow.io/docs/latest/singularity.html#singularity-docker-hub export NXF_SINGULARITY_LIBRARYDIR=<path in your machine> # Set a path to your singularity storage dir export NXF_SINGULARITY_CACHEDIR=<path in your machine> # Set a path to your singularity cache dir export SINGULARITY_CACHEDIR=<path in your machine> # Set a path to your singularity cache dir # TODO: ADD Information about TMPDIR # run nextflow run fmalmeida/mpgap -profile singularity [options] -
for conda
# for conda # it is better to create envs with mamba for faster solving wget https://github.com/fmalmeida/mpgap/raw/master/environment.yml conda env create -f environment.yml # advice: use mamba # must be executed from the base environment # This tells nextflow to load the available mpgap environment when required nextflow run fmalmeida/mpgap -profile conda [options]:dart: Please make sure to also download its busco databases. See the explanation
-
-
Start running your analysis
nextflow run fmalmeida/mpgap -profile <docker/singularity/conda>
:fire: Please read the documentation below on selecting between conda, docker or singularity profiles, since the tools will be made available differently depending on the profile desired.
Quickstart
A few testing datasets have been made available so that users can quickly try-out the features available in the pipeline:
# short-reads
nextflow run fmalmeida/mpgap -profile test,sreads,<docker/singularity>
# long-reads
nextflow run fmalmeida/mpgap -profile test,lreads,<ont/pacbio>,<docker/singularity>
# hybrid
nextflow run fmalmeida/mpgap -profile test,hybrid,<ont/pacbio>,<docker/singularity>
Documentation
<a href="https://mpgap.readthedocs.io/en/latest/index.html"><strong>Complete online documentation. »</strong></a>
Selecting between profiles
Nextflow profiles are a set of "sensible defaults" for the resource requirements of each of the steps in the workflow, that can be enabled with the command line flag -profile. You can learn more about nextflow profiles at:
- https://nf-co.re/usage/configuration#basic-configuration-profiles
- https://www.nextflow.io/docs/latest/config.html#config-profiles
The pipeline have "standard profiles" set to run the workflows with either conda, docker or singularity using the local executor, which is nextflow's default and basically runs the pipeline processes in the computer where Nextflow is launched. If you need to run the pipeline using another executor such as sge, lsf, slurm, etc. you can take a look at nextflow's manual page to proper configure one in a new custom profile set in your personal copy of MpGAP config file and take advantage that nextflow allows multiple profiles to be used at once, e.g. -profile conda,sge.
By default, if no profile is chosen, the pipeline will try to load tools from the local machine $PATH. Available pre-set profiles for this pipeline are: docker/conda/singularity, you can choose between them as follows:
-
conda
# must be executed from the base environment # This tells nextflow to load the available mpgap environment when required nextflow run fmalmeida/mpgap -profile conda [options] -
docker
nextflow run fmalmeida/mpgap -profile docker [options] -
singularity
nextflow run fmalmeida/mpgap -profile singularity [options]
Note on conda
:book: Please use conda as last resource
Instructions to create required conda environment are found in the installation section
The usage of conda profile will only work in linux-64 machine because some of the tools only have its binaries available for this machine, and others had to be put inside the "bin" dir to avoid version compatibility also were compiled for linux-64. A few examples are: wtdbg2, ALE (used as auxiliary tool in pilon polish step), spades v3.13 for unicycler, and others.
Therefore, be aware, -profile conda will only work on linux-64 machines. Users in orther systems must use it with docker or singularity.
Finally, the main conda packages in the environment.yml file have been
