SkillAgentSearch skills...

FunFlux

FunFlux: A dedicated workflow for fungal genome assembly from short reads, decontamination, completeness validation, and comprehensive gene annotation.

Install / Use

/learn @iLivius/FunFlux
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

FunFlux

Integrated workflow for fungal genome assembly and annotation.

Snakemake DOI


__________             _______________              
___  ____/___  ___________  ____/__  /___  _____  __
__  /_   _  / / /_  __ \_  /_   __  /_  / / /_  |/_/
_  __/   / /_/ /_  / / /  __/   _  / / /_/ /__>  <  
/_/      \__,_/ /_/ /_//_/      /_/  \__,_/ /_/|_|
              
FunFlux v1.0.7


February 2026

Authors and Contributors

AIT Austrian Institute of Technology, Center for Health & Bioresources

  • Livio Antonielli
  • Günter Brader
  • Stéphane Compant

Synopsis

FunFlux is a Snakemake workflow designed for the genome assembly and annotation of fungal short reads sequenced with Illumina technology. It also supports the analysis of pre-assembled contigs. The workflow includes features such as contig selection and decontamination, genome completeness assessment, ITS extraction with taxonomic assignment, and precise gene prediction and annotation.

Table of Contents

Rationale

The analysis of fungal whole-genome sequencing (WGS) data involves a complex series of bioinformatic steps that can be challenging to execute manually. This process is often time-consuming, prone to errors, and difficult to reproduce. FunFlux addresses these challenges by offering a comprehensive and automated Snakemake workflow specifically designed for fungal genomic data analysis.

FunFlux is designed to streamline the annotation process with funannotate in the absence of RNA sequencing evidence. It relies on both ab initio annotation and protein FASTA sequences from organisms of the same species or genus to enhance the accuracy of gene prediction and annotation.

Description

Here's a breakdown of the FunFlux workflow:

  1. Preprocessing:

    • Raw reads are checked for Illumina phiX contamination using bowtie2.

    • Adapters are removed and reads are filtered using fastp.

  2. Assembly:

    • Filtered reads are assembled into contigs with SPAdes.
  3. QC, Decontamination, Completeness Assessment, and ITS extraction:

    • Contigs are filtered based on a minimum length of 500 bp and a coverage of 2x.
    • Filtered reads are mapped back to contigs using bowtie2 and samtools. The resulting BAM file is analyzed with QualiMap.
    • Local alignments of contigs are performed against the NCBI core nt database using BLAST+.
    • Contaminant contigs are checked with BlobTools. Unless otherwise specified (see configuration section for more details), the output of this step will be parsed automatically to discard contaminants based on the relative taxonomic composition of the contigs.
    • Genome assembly quality is evaluated with Quast.
    • Genome completeness is assessed with BUSCO using taxon-specific markers.
    • ITS markers are detected and extracted with ITSx.
    • ITS taxonomic assignment is performed with SINTAX re-implemented in VSEARCH using the UNITE database as reference.
  4. Gene Prediction:

    FunFlux is optimized to leverage the funannotate pipeline in cases where RNA sequencing data is not available. Instead, it utilizes external protein evidence along with robust ab initio prediction methods to produce accurate gene models for fungal genomes. Below is a step-by-step breakdown of the workflow:

    • Preprocessing the genome assembly

      • N50 calculation and contig duplication checking: As part of the cleaning process, the N50 value is calculated, and contigs shorter than this value are checked for duplication. Only unique, non-redundant contigs are retained, ensuring that the assembly is as clean and representative as possible.

      • Sorting and renaming FASTA headers: The assembled contigs are sorted by length and headers are renamed to ensure compatibility with follow-up tools.

      • Repeat masking: Before gene prediction, the genome assembly is softmasked using the tantan software to obscure repetitive elements, which helps in preventing spurious gene predictions in these regions.

    • Incorporating protein evidence

      • Protein alignment: DIAMOND is used to quickly search for homologies between the genome and provided protein sequences of closely related taxa, as well as the UniProt database. These matches are then refined with Exonerate, which aligns the protein sequences to the genome with high precision, providing evidence for gene structures.
    • Ab initio gene prediction

      • GeneMark-ES: This tool performs self-training on the genome sequence to predict genes without the need for external training data, making it especially useful for identifying genes in regions lacking homology-based evidence.
    • Ortholog detection and model training

      • BUSCO: Based on conserved orthologous genes, it provides high-quality evidence for training gene prediction tools. Conserved genes are passed to Augustus to improve its predictive accuracy.

      • Augustus training: It works with the closest taxon model available, as well as the evidence from BUSCO, DIAMOND/Exonerate, and the outputs from other ab initio predictors like SNAP and GlimmerHMM. This comprehensive training enables Augustus to generate highly accurate gene predictions.

    • Combining predictions with EVidenceModeler

    • Refining steps

      • Gene model filtering: The gene models generated by EVM are subjected to further filtering to remove short, low-confidence predictions, models spanning gaps, and potential transposable elements.

      • tRNA prediction: tRNA genes are predicted using tRNAscan-SE, ensuring comprehensive annotation of both protein-coding and non-coding genes.

      • NCBI submission preparation: Generation of an NCBI-compatible annotation table (.tbl format) and conversion to GenBank format using tbl2asn. The workflow also includes a validation step to parse NCBI error reports and alert users to any gene models that need manual correction.

  5. Gene Annotation:

    A comprehensive gene annotation process assigns functional information to the identified genes. This process integrates multiple annotation tools and culminates in a final annotation round performed by funannotate. Below is an overview of the workflow:

    • InterProScan (v5.65-97.0): This tool is employed to assign protein domains and predict functional sites within the gene models. It integrates data from multiple databases such as Pfam, SMART, PANTHER and PROSITE, providing a rich set of functional annotations.

    • EggNOG-mapper (v2.1.12): This software is used to predict orthology and functional annotations based on the EggNOG database (v5.0). It helps in assigning Gene Ontology (GO) terms, enzyme codes, and pathway annotations to the gene models, offering insights into the biological roles of the proteins.

    • antiSMASH (v8.0.4): For fungal genomes, secondary metabolite gene clusters related to antibiotics or toxins are of particular interest.

    • HMMer for PFAM database (v38.0)

    • DIAMOND for UniProt (v2025_04)

    • DIAMOND for MEROPS database (v12.0)

    • CAZyme annotation with dbCAN (v14.0).

  6. Report:

    • Results are parsed and aggregated to generate a report usi

Related Skills

View on GitHub
GitHub Stars9
CategoryDevelopment
Updated1mo ago
Forks1

Languages

Python

Security Score

85/100

Audited on Feb 18, 2026

No findings