MUFFIN <img src=".figure/Logo_MUFFIN_cropped.png" width="240" height="160" />

MUFFIN is a hybrid assembly and differential binning workflow for metagenomics, transcriptomics and pathway analysis.

If you use MUFFIN in your research, please cite our paper

The documentation is available here https://rvandamme.github.io/MUFFIN_Documentation/#introduction

Introduction

MUFFIN aims at being a reproducible pipeline for metagenome assembly of crossed illumina and nanopore reads.

MUFFIN uses the following software

| Task | Software | Version | Docker | Image version| | --- | --- | --- | --- | --- | | QC illumina | fastp | 0.20.0 | LINK | 0.20.0--78a7c63 | | QC ont | automated way to discard shortest reads | | | | | | filtlong | 0.2.0 | LINK | v0.2.0--afa175e | | metagenomic composition of ont | sourmash | 2.0.1 | LINK | 2.0.1--6970ddc | | Hybrid assembly | Meta-spades | 3.13.1 | LINK | 3.13.1--2c2a4c0 | | | unicycler | 0.4.7 | LINK | 0.4.7-0--c0404e6 | | Long read assembly | MetaFlye | 2.7 | LINK | 2.7--957a1a1 | | polishing | racon | 1.4.13 | LINK | 1.4.13--bb8a908 | | | medaka | 1.0.3 | LINK | 1.0.3--7c62d67 | | | pilon | 1.23 | LINK | 1.23--b21026d | | mapping | minimap2 | 2.17 | LINK | 2.17--caba7af | | | bwa | 0.7.17 | LINK | 1.23--b21026d | | | samtools | 1.9 | LINK | 2.17--caba7af | | retrieve reads mapped to contig | seqtk | 1.3 | LINK | 1.3--dc0d16b | | Binning | Metabat2 | 2.13 | LINK | 2.13--0e2577e | | | maxbin2 | 2.2.7 | LINK | 2.2.7--b643a6b | | | concoct | 1.1.0 | LINK | 1.1.0--03a3888 | | | metawrap | 1.2.2 | LINK | 1.2.2--de94241 | | qc binning | checkm | 1.0.13 | LINK | 1.0.13--248242f | |Taxonomic Classification | sourmash using the gt-DataBase | 2.0.1 | LINK | 2.0.1--6970ddc | | | GTDB | version r89 | | | | Annotations (bin and RNA) | eggNOG | 2.0.1 | LINK | 2.0.1--d5e0c8c | | | eggNOG DB | v5.0 | | | | De novo transcript and quantification | Trinity | 2.9.1 | LINK | 2.9.1--82fe26c | | | Salmon | 0.15.0 | LINK | 2.9.1--82fe26c |

Figure

The Workflow

MUFFIN FLOWCHART FIGURE

The parser output

PARSER OUTPUT FIGURE

Installation

base installation

You need to install nextflow Version 20.07+ ( https://www.nextflow.io/ )

# verify Java version (at least version 8+)
java -version 

# Setup nextflow (it will create a nextflow executable file in the current directory)
curl -s https://get.nextflow.io | bash

# If you want the pipeline installed locally use the following
git clone https://github.com/RVanDamme/MUFFIN.git

# If you want to not install the pipeline use the following when running nextflow

nextflow run  RVanDamme/MUFFIN --parameters.....

For conda usage

If you use conda, you don't need extra installations. An error might occur with the installation of metawrap, if so please consult Troubleshooting.

For gcloud usage

If you use the google lifescience ressources you first need to setup a few parameters.

In the nextflow.config you need to change the parameters of gcloud to correspond to your project (line 67 to 78).

    gcloud {  
        //workDir = "/tmp/nextflow-docker_pipelines-$USER"
        process.executor = 'google-lifesciences'
        process.memory = params.memory
        bucketDir = 'gs://bucket/work-dir' // change this to your bucket where you want the workfile to be stored
        google { project = 'project-name-111111'; zone = 'europe-north1-a' } // insert your project ID as well as the zone(s) you want to use
        // you can also use {region = 'europe-north1'} instead of zone
        google.lifeSciences.copyImage = 'google/cloud-sdk:latest'
        google.lifeSciences.preemptible = true
        google.lifeSciences.bootDiskSize = "10GB"
        google.lifeSciences.debug = true
        //includeConfig 'configs/preemptible.config'
    }

You will also have to change the bucket to store the different database in: /modules/checkmgetdatabases.nf ; /modules/eggnog_get_databases.nf ; /modules/sourmashgetdatabase.nf

To do so just edit the line using your bucket. keep the structure for more clarity (e.g. keep the "/databases-nextflow/sourmash" part).

Example:

if (workflow.profile.contains('gcloud')) {publishDir 'gs://gcloud_storage/databases-nextflow/sourmash', mode: 'copy', pattern: "genbank-k31.lca.json.gz" }
#becomes
if (workflow.profile.contains('gcloud')) {publishDir 'gs://MY_STORAGE/databases-nextflow/sourmash', mode: 'copy', pattern: "genbank-k31.lca.json.gz" }

If you desire run on gcloud without the preemptible parameter activated just edit the line 74 of nextflow.config to false.

For containers usage

If you use containers either docker or singularity, you don't need extra installations.

For usage of software installed locally

You just need to have all the software used in the pipeline (see table above) installed and in your $PATH.

Test the pipeline

To test the pipeline we have a subset of 5 bins available at https://osf.io/9xmh4/ A detailed explanation of all the parameter is available in Usage, the most important for the test is the profile executor and engine. To run it you just need to add "test" in the -profile parameter e.g.:

#test locally with conda, you need to specify cpus and ram available
nextflow run RVanDamme/MUFFIN --output results_dir  --cpus 8 --memory 32g -profile local,conda,test

#test locally with docker, you can change the cpus and ram in configs/containers.config
# this test also run the transcriptomics analysis with --rna
nextflow run RVanDamme/MUFFIN --output results_dir --rna -profile local,docker,test

#test using gcloud with docker, you can change the cpus and ram in configs/containers.config
# this test use flye instead of spades with the --assembler metaflye
nextflow run RVanDamme/MUFFIN --output results_dir --assembler metaflye -profile gcloud,docker,test

The subset contains also RNA data to test with transcriptomics analysis you just need to activate it using "--rna" The results of the different test run are available at https://osf.io/m5czv/

Usage

Automated usage

To avoid writing all the parameter in the CLI you can use the additional "-params-file" and provide a .yml file that contains all the parameters available for MUFFIN and described below. You can find the MUFFIN_params.yml file in the base of MUFFIN directory.

Exemple: MUFFIN_params.yml

assembler   : "metaspades"
ouptut      : "path/to/resultdir"
illumina    : "fastq_ill/"
ont         : "fastq_ont/"
cpus        : 16
memory      : "64g"
modular     : "full"

MUFFIN comand:

path/to/nextflow run $MUFFIN_pipeline -params-file MUFFIN_params.yml -profile local,conda,test

$MUFFIN_pipeline is either "path/to/MUFFIN/main.nf" or "RVanDamme/MUFFIN"

Basic usage

path/to/nextflow run $MUFFIN_pipeline --output results_dir --assembler $assembler --illumina fastq_ill/ --ont fastq_ont/ --cpus 16 --memory 6

MUFFIN

Install / Use

README