Seqinspector
Dedicated QC-only pipeline for sequencing data. The pipeline will run a (potentially large) set of QC tools and can output global and group specific Multiqc reports. The pipeline is targeting core facilities or research groups with larger sequencing throughput.
Install / Use
/learn @nf-core/SeqinspectorREADME
Introduction
nf-core/seqinspector is a bioinformatics pipeline that processes raw sequence data (FASTQ) to provide comprehensive quality control. It can perform subsampling, quality assessment, duplication level analysis, and complexity evaluation on a per-sample basis, while also detecting adapter content, technical artifacts, and common biological contaminants. The pipeline generates detailed MultiQC reports with flexible output options, ranging from individual sample reports to project-wide summaries, making it particularly useful for sequencing core facilities and research groups with access to sequencing instruments. If provided, nf-core/seqinspector can also parse statistics from an Illumina run folder directory into the final MultiQC reports.
Compatibility between tools and data type
<!-- TODO: add a search tool that accepts a tree for `Compatibility with Data`. -->| Tool Type | Tool Name | Tool Description | Compatibility with Data | Dependencies | Default tool |
| ------------------- | ------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------- | ----------------------- | ------------------------------------------------------------------------------------------------------------------------- | ------------ |
| Subsampling | Seqtk | Global subsampling of reads. Only performs subsampling if --sample_size parameter is given. | [RNA, DNA, synthetic] | [N/A] | no |
| Indexing, Mapping | Bwamem2 | Align reads to reference | [RNA, DNA] | [N/A] | yes |
| Indexing | SAMtools | Index aligned BAM files, create FASTA index | [DNA] | [N/A] | yes |
| QC | FastQC | Read QC | [RNA, DNA] | [N/A] | yes |
| QC | FastqScreen | Basic contamination detection | [RNA, DNA] | [N/A] | yes |
| QC | SeqFu Stats | Sequence statistics | [RNA, DNA] | [N/A] | yes |
| QC | Picard collect multiple metrics | Collect multiple QC metrics | [RNA, DNA] | [Bwamem2, SAMtools, --genome] | yes |
| QC | Picard_collecthsmetrics | Collect alignment QC metrics of hybrid-selection data. | [RNA, DNA] | [Bwamem2, SAMtools, --fasta, --run_picard_collecths_metrics, --bait_intervals, --target_intervals (--ref_dict)] | no |
| Reporting | MultiQC | Present QC for raw reads | [RNA, DNA, synthetic] | [N/A] | yes |
Workflow diagram
<picture> <source media="(prefers-color-scheme: dark)" srcset="docs/images/seqinspector_tubemap_dark.png"> <source media="(prefers-color-scheme: light)" srcset="docs/images/seqinspector_tubemap_light.png"> <img alt="Fallback image description" src="docs/images/seqinspector_tubemap_light.png"> </picture>Summary of tools and version used in the pipeline
| Tool | Version | | ----------- | ------- | | bwamem2 | 2.3 | | fastqc | 0.12.1 | | fastqscreen | 0.16.0 | | multiqc | 1.33 | | picard | 3.4.0 | | samtools | 1.22.1 | | seqfu | 1.22.3 | | seqtk | 1.4 |
Usage
[!NOTE] If you are new to Nextflow and nf-core, please refer to this page on how to set-up Nextflow. Make sure to test your setup with
-profile testbefore running the workflow on actual data.
First, prepare a samplesheet with your input data that looks as follows:
samplesheet.csv:
sample,fastq_1,fastq_2,rundir,tags
CONTROL_REP1,AEG588A1_S1_L002_R1_001.fastq.gz,AEG588A1_S1_L002_R2_001.fastq.gz,200624_A00834_0183_BHMTFYDRXX,lane1:project5:group2
Each row represents a fastq file (single-end with only fastq_1) or a pair of fastq files (paired end with fastq_1 and fastq_2).
rundir is the path to the runfolder.
tags is a colon-separated list of tags that will be added to the MultiQC report for this sample.
Now, you can run the pipeline using:
nextflow run nf-core/seqinspector \
-profile <docker/singularity/.../institute> \
--input samplesheet.csv \
--outdir <OUTDIR>
[!WARNING] Please provide pipeline parameters via the CLI or Nextflow
-params-fileoption. Custom config files including those provided by the-cNextflow option can be used to
