Introduction

nf-core/seqinspector is a bioinformatics pipeline that processes raw sequence data (FASTQ) to provide comprehensive quality control. It can perform subsampling, quality assessment, duplication level analysis, and complexity evaluation on a per-sample basis, while also detecting adapter content, technical artifacts, and common biological contaminants. The pipeline generates detailed MultiQC reports with flexible output options, ranging from individual sample reports to project-wide summaries, making it particularly useful for sequencing core facilities and research groups with access to sequencing instruments. If provided, nf-core/seqinspector can also parse statistics from an Illumina run folder directory into the final MultiQC reports.

Compatibility between tools and data type

| Tool Type | Tool Name | Tool Description | Compatibility with Data | Dependencies | Default tool | | ------------------- | ------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------- | ----------------------- | ------------------------------------------------------------------------------------------------------------------------- | ------------ | | Subsampling | Seqtk | Global subsampling of reads. Only performs subsampling if --sample_size parameter is given. | [RNA, DNA, synthetic] | [N/A] | no | | Indexing, Mapping | Bwamem2 | Align reads to reference | [RNA, DNA] | [N/A] | yes | | Indexing | SAMtools | Index aligned BAM files, create FASTA index | [DNA] | [N/A] | yes | | QC | FastQC | Read QC | [RNA, DNA] | [N/A] | yes | | QC | FastqScreen | Basic contamination detection | [RNA, DNA] | [N/A] | yes | | QC | SeqFu Stats | Sequence statistics | [RNA, DNA] | [N/A] | yes | | QC | Picard collect multiple metrics | Collect multiple QC metrics | [RNA, DNA] | [Bwamem2, SAMtools, --genome] | yes | | QC | Picard_collecthsmetrics | Collect alignment QC metrics of hybrid-selection data. | [RNA, DNA] | [Bwamem2, SAMtools, --fasta, --run_picard_collecths_metrics, --bait_intervals, --target_intervals (--ref_dict)] | no | | Reporting | MultiQC | Present QC for raw reads | [RNA, DNA, synthetic] | [N/A] | yes |

Workflow diagram

Summary of tools and version used in the pipeline

| Tool | Version | | ----------- | ------- | | bwamem2 | 2.3 | | fastqc | 0.12.1 | | fastqscreen | 0.16.0 | | multiqc | 1.33 | | picard | 3.4.0 | | samtools | 1.22.1 | | seqfu | 1.22.3 | | seqtk | 1.4 |

Usage

[!NOTE] If you are new to Nextflow and nf-core, please refer to this page on how to set-up Nextflow. Make sure to test your setup with -profile test before running the workflow on actual data.

First, prepare a samplesheet with your input data that looks as follows:

samplesheet.csv:

sample,fastq_1,fastq_2,rundir,tags
CONTROL_REP1,AEG588A1_S1_L002_R1_001.fastq.gz,AEG588A1_S1_L002_R2_001.fastq.gz,200624_A00834_0183_BHMTFYDRXX,lane1:project5:group2

Each row represents a fastq file (single-end with only fastq_1) or a pair of fastq files (paired end with fastq_1 and fastq_2). rundir is the path to the runfolder. tags is a colon-separated list of tags that will be added to the MultiQC report for this sample.

Now, you can run the pipeline using:

nextflow run nf-core/seqinspector \
   -profile <docker/singularity/.../institute> \
   --input samplesheet.csv \
   --outdir <OUTDIR>

[!WARNING] Please provide pipeline parameters via the CLI or Nextflow -params-file option. Custom config files including those provided by the -c Nextflow option can be used to

Seqinspector

Install / Use

README

Introduction

Compatibility between tools and data type

Workflow diagram

Summary of tools and version used in the pipeline

Usage