MUFFIN
hybrid assembly and differential binning workflow for metagenomics, transcriptomics and pathway analysis
Install / Use
/learn @RVanDamme/MUFFINREADME
MUFFIN <img src=".figure/Logo_MUFFIN_cropped.png" width="240" height="160" />
MUFFIN is a hybrid assembly and differential binning workflow for metagenomics, transcriptomics and pathway analysis.
If you use MUFFIN in your research, please cite our paper
The documentation is available here https://rvandamme.github.io/MUFFIN_Documentation/#introduction
INDEX
- Introduction
- Figure :
- Installation :
- Test the pipeline
- Usage :
- Troubleshooting
- Options
- Complete help and options
- Bibliography
- License
Introduction
MUFFIN aims at being a reproducible pipeline for metagenome assembly of crossed illumina and nanopore reads.
MUFFIN uses the following software
| Task | Software | Version | Docker | Image version| | --- | --- | --- | --- | --- | | QC illumina | fastp | 0.20.0 | LINK | 0.20.0--78a7c63 | | QC ont | automated way to discard shortest reads | | | | | | filtlong | 0.2.0 | LINK | v0.2.0--afa175e | | metagenomic composition of ont | sourmash | 2.0.1 | LINK | 2.0.1--6970ddc | | Hybrid assembly | Meta-spades | 3.13.1 | LINK | 3.13.1--2c2a4c0 | | | unicycler | 0.4.7 | LINK | 0.4.7-0--c0404e6 | | Long read assembly | MetaFlye | 2.7 | LINK | 2.7--957a1a1 | | polishing | racon | 1.4.13 | LINK | 1.4.13--bb8a908 | | | medaka | 1.0.3 | LINK | 1.0.3--7c62d67 | | | pilon | 1.23 | LINK | 1.23--b21026d | | mapping | minimap2 | 2.17 | LINK | 2.17--caba7af | | | bwa | 0.7.17 | LINK | 1.23--b21026d | | | samtools | 1.9 | LINK | 2.17--caba7af | | retrieve reads mapped to contig | seqtk | 1.3 | LINK | 1.3--dc0d16b | | Binning | Metabat2 | 2.13 | LINK | 2.13--0e2577e | | | maxbin2 | 2.2.7 | LINK | 2.2.7--b643a6b | | | concoct | 1.1.0 | LINK | 1.1.0--03a3888 | | | metawrap | 1.2.2 | LINK | 1.2.2--de94241 | | qc binning | checkm | 1.0.13 | LINK | 1.0.13--248242f | |Taxonomic Classification | sourmash using the gt-DataBase | 2.0.1 | LINK | 2.0.1--6970ddc | | | GTDB | version r89 | | | | Annotations (bin and RNA) | eggNOG | 2.0.1 | LINK | 2.0.1--d5e0c8c | | | eggNOG DB | v5.0 | | | | De novo transcript and quantification | Trinity | 2.9.1 | LINK | 2.9.1--82fe26c | | | Salmon | 0.15.0 | LINK | 2.9.1--82fe26c |
Figure
The Workflow

The parser output

Installation
base installation
You need to install nextflow Version 20.07+ ( https://www.nextflow.io/ )
# verify Java version (at least version 8+)
java -version
# Setup nextflow (it will create a nextflow executable file in the current directory)
curl -s https://get.nextflow.io | bash
# If you want the pipeline installed locally use the following
git clone https://github.com/RVanDamme/MUFFIN.git
# If you want to not install the pipeline use the following when running nextflow
nextflow run RVanDamme/MUFFIN --parameters.....
For conda usage
If you use conda, you don't need extra installations. An error might occur with the installation of metawrap, if so please consult Troubleshooting.
For gcloud usage
If you use the google lifescience ressources you first need to setup a few parameters.
In the nextflow.config you need to change the parameters of gcloud to correspond to your project (line 67 to 78).
gcloud {
//workDir = "/tmp/nextflow-docker_pipelines-$USER"
process.executor = 'google-lifesciences'
process.memory = params.memory
bucketDir = 'gs://bucket/work-dir' // change this to your bucket where you want the workfile to be stored
google { project = 'project-name-111111'; zone = 'europe-north1-a' } // insert your project ID as well as the zone(s) you want to use
// you can also use {region = 'europe-north1'} instead of zone
google.lifeSciences.copyImage = 'google/cloud-sdk:latest'
google.lifeSciences.preemptible = true
google.lifeSciences.bootDiskSize = "10GB"
google.lifeSciences.debug = true
//includeConfig 'configs/preemptible.config'
}
You will also have to change the bucket to store the different database in: /modules/checkmgetdatabases.nf ; /modules/eggnog_get_databases.nf ; /modules/sourmashgetdatabase.nf
To do so just edit the line using your bucket. keep the structure for more clarity (e.g. keep the "/databases-nextflow/sourmash" part).
Example:
if (workflow.profile.contains('gcloud')) {publishDir 'gs://gcloud_storage/databases-nextflow/sourmash', mode: 'copy', pattern: "genbank-k31.lca.json.gz" }
#becomes
if (workflow.profile.contains('gcloud')) {publishDir 'gs://MY_STORAGE/databases-nextflow/sourmash', mode: 'copy', pattern: "genbank-k31.lca.json.gz" }
If you desire run on gcloud without the preemptible parameter activated just edit the line 74 of nextflow.config to false.
For containers usage
If you use containers either docker or singularity, you don't need extra installations.
For usage of software installed locally
You just need to have all the software used in the pipeline (see table above) installed and in your $PATH.
Test the pipeline
To test the pipeline we have a subset of 5 bins available at https://osf.io/9xmh4/ A detailed explanation of all the parameter is available in Usage, the most important for the test is the profile executor and engine. To run it you just need to add "test" in the -profile parameter e.g.:
#test locally with conda, you need to specify cpus and ram available
nextflow run RVanDamme/MUFFIN --output results_dir --cpus 8 --memory 32g -profile local,conda,test
#test locally with docker, you can change the cpus and ram in configs/containers.config
# this test also run the transcriptomics analysis with --rna
nextflow run RVanDamme/MUFFIN --output results_dir --rna -profile local,docker,test
#test using gcloud with docker, you can change the cpus and ram in configs/containers.config
# this test use flye instead of spades with the --assembler metaflye
nextflow run RVanDamme/MUFFIN --output results_dir --assembler metaflye -profile gcloud,docker,test
The subset contains also RNA data to test with transcriptomics analysis you just need to activate it using "--rna" The results of the different test run are available at https://osf.io/m5czv/
Usage
Automated usage
To avoid writing all the parameter in the CLI you can use the additional "-params-file" and provide a .yml file that contains all the parameters available for MUFFIN and described below. You can find the MUFFIN_params.yml file in the base of MUFFIN directory.
Exemple: MUFFIN_params.yml
assembler : "metaspades"
ouptut : "path/to/resultdir"
illumina : "fastq_ill/"
ont : "fastq_ont/"
cpus : 16
memory : "64g"
modular : "full"
MUFFIN comand:
path/to/nextflow run $MUFFIN_pipeline -params-file MUFFIN_params.yml -profile local,conda,test
$MUFFIN_pipeline is either "path/to/MUFFIN/main.nf" or "RVanDamme/MUFFIN"
Basic usage
path/to/nextflow run $MUFFIN_pipeline --output results_dir --assembler $assembler --illumina fastq_ill/ --ont fastq_ont/ --cpus 16 --memory 6
