FunFlux
FunFlux: A dedicated workflow for fungal genome assembly from short reads, decontamination, completeness validation, and comprehensive gene annotation.
Install / Use
/learn @iLivius/FunFluxREADME
FunFlux
Integrated workflow for fungal genome assembly and annotation.
__________ _______________
___ ____/___ ___________ ____/__ /___ _____ __
__ /_ _ / / /_ __ \_ /_ __ /_ / / /_ |/_/
_ __/ / /_/ /_ / / / __/ _ / / /_/ /__> <
/_/ \__,_/ /_/ /_//_/ /_/ \__,_/ /_/|_|
FunFlux v1.0.7
February 2026
Authors and Contributors
AIT Austrian Institute of Technology, Center for Health & Bioresources
- Livio Antonielli
- Günter Brader
- Stéphane Compant
Synopsis
FunFlux is a Snakemake workflow designed for the genome assembly and annotation of fungal short reads sequenced with Illumina technology. It also supports the analysis of pre-assembled contigs. The workflow includes features such as contig selection and decontamination, genome completeness assessment, ITS extraction with taxonomic assignment, and precise gene prediction and annotation.
Table of Contents
- Rationale
- Description
- Installation
- Configuration
- Running FunFlux
- Output
- Acknowledgements
- Citation
- References
Rationale
The analysis of fungal whole-genome sequencing (WGS) data involves a complex series of bioinformatic steps that can be challenging to execute manually. This process is often time-consuming, prone to errors, and difficult to reproduce. FunFlux addresses these challenges by offering a comprehensive and automated Snakemake workflow specifically designed for fungal genomic data analysis.
FunFlux is designed to streamline the annotation process with funannotate in the absence of RNA sequencing evidence. It relies on both ab initio annotation and protein FASTA sequences from organisms of the same species or genus to enhance the accuracy of gene prediction and annotation.
Description
Here's a breakdown of the FunFlux workflow:
-
Preprocessing:
-
Assembly:
- Filtered reads are assembled into contigs with SPAdes.
-
QC, Decontamination, Completeness Assessment, and ITS extraction:
- Contigs are filtered based on a minimum length of 500 bp and a coverage of 2x.
- Filtered reads are mapped back to contigs using bowtie2 and samtools. The resulting BAM file is analyzed with QualiMap.
- Local alignments of contigs are performed against the NCBI core nt database using BLAST+.
- Contaminant contigs are checked with BlobTools. Unless otherwise specified (see configuration section for more details), the output of this step will be parsed automatically to discard contaminants based on the relative taxonomic composition of the contigs.
- Genome assembly quality is evaluated with Quast.
- Genome completeness is assessed with BUSCO using taxon-specific markers.
- ITS markers are detected and extracted with ITSx.
- ITS taxonomic assignment is performed with SINTAX re-implemented in VSEARCH using the UNITE database as reference.
-
Gene Prediction:
FunFluxis optimized to leverage the funannotate pipeline in cases where RNA sequencing data is not available. Instead, it utilizes external protein evidence along with robust ab initio prediction methods to produce accurate gene models for fungal genomes. Below is a step-by-step breakdown of the workflow:-
Preprocessing the genome assembly
-
N50 calculation and contig duplication checking: As part of the cleaning process, the N50 value is calculated, and contigs shorter than this value are checked for duplication. Only unique, non-redundant contigs are retained, ensuring that the assembly is as clean and representative as possible.
-
Sorting and renaming FASTA headers: The assembled contigs are sorted by length and headers are renamed to ensure compatibility with follow-up tools.
-
Repeat masking: Before gene prediction, the genome assembly is softmasked using the tantan software to obscure repetitive elements, which helps in preventing spurious gene predictions in these regions.
-
-
Incorporating protein evidence
- Protein alignment: DIAMOND is used to quickly search for homologies between the genome and provided protein sequences of closely related taxa, as well as the UniProt database. These matches are then refined with Exonerate, which aligns the protein sequences to the genome with high precision, providing evidence for gene structures.
-
Ab initio gene prediction
- GeneMark-ES: This tool performs self-training on the genome sequence to predict genes without the need for external training data, making it especially useful for identifying genes in regions lacking homology-based evidence.
-
Ortholog detection and model training
-
BUSCO: Based on conserved orthologous genes, it provides high-quality evidence for training gene prediction tools. Conserved genes are passed to Augustus to improve its predictive accuracy.
-
Augustus training: It works with the closest taxon model available, as well as the evidence from BUSCO, DIAMOND/Exonerate, and the outputs from other ab initio predictors like SNAP and GlimmerHMM. This comprehensive training enables Augustus to generate highly accurate gene predictions.
-
-
Combining predictions with EVidenceModeler
- EVidenceModeler (EVM): The predictions from various ab initio tools, such as Augustus, SNAP, GlimmerHMM are combined to generate consensus gene models.
-
Refining steps
-
Gene model filtering: The gene models generated by EVM are subjected to further filtering to remove short, low-confidence predictions, models spanning gaps, and potential transposable elements.
-
tRNA prediction: tRNA genes are predicted using tRNAscan-SE, ensuring comprehensive annotation of both protein-coding and non-coding genes.
-
NCBI submission preparation: Generation of an NCBI-compatible annotation table (.tbl format) and conversion to GenBank format using tbl2asn. The workflow also includes a validation step to parse NCBI error reports and alert users to any gene models that need manual correction.
-
-
-
Gene Annotation:
A comprehensive gene annotation process assigns functional information to the identified genes. This process integrates multiple annotation tools and culminates in a final annotation round performed by funannotate. Below is an overview of the workflow:
-
InterProScan (v5.65-97.0): This tool is employed to assign protein domains and predict functional sites within the gene models. It integrates data from multiple databases such as
Pfam,SMART,PANTHERandPROSITE, providing a rich set of functional annotations. -
EggNOG-mapper (v2.1.12): This software is used to predict orthology and functional annotations based on the
EggNOGdatabase (v5.0). It helps in assigning Gene Ontology (GO) terms, enzyme codes, and pathway annotations to the gene models, offering insights into the biological roles of the proteins. -
antiSMASH (v8.0.4): For fungal genomes, secondary metabolite gene clusters related to antibiotics or toxins are of particular interest.
-
HMMer for PFAM database (v38.0)
-
CAZyme annotation with dbCAN (v14.0).
-
-
Report:
- Results are parsed and aggregated to generate a report usi
Related Skills
node-connect
342.0kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
84.7kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
342.0kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
commit-push-pr
84.7kCommit, push, and open a PR
