SkillAgentSearch skills...

Cecret

Reference-based consensus creation

Install / Use

/learn @UPHL-BioNGS/Cecret
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

<div align="center"> <img src="images/logo.png" alt="Cecret Logo" width="220"> <h1>Cecret</h1> <p> <strong>Reference-based consensus creation.</strong> </p>

Nextflow nf-core Launch with Seqera CI Status

Pipeline Status License: MIT GitHub Stars

</div> <br> **A Nextflow workflow for reference-based amplicon consensus generation.**

Named after the beautiful Cecret lake

Location: 40.570°N 111.622°W , Elevation: 9,875 feet (3,010 m), Hiking level: easy


<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/c/cb/Cecret_Lake_Panorama_Albion_Basin_Alta_Utah_July_2009.jpg/3840px-Cecret_Lake_Panorama_Albion_Basin_Alta_Utah_July_2009.jpg" width="500"/>

Image credit: Jeffrey McGrath posted on the Wikipedia Article


Table of Contents:


Introduction

Cecret was originally developed by @erinyoung at the Utah Public Health Laborotory for SARS-COV-2 sequencing with the artic/Illumina hybrid library prep workflow for MiSeq data with protocols here and here. This nextflow workflow, however, is flexible for many additional organisms and primer schemes as long as the reference genome is "small" and "good enough." In 2022, @tives82 added in contributions for Monkeypox virus, including converting IDT's primer scheme to NC_063383.1 coordinates. We are grateful to everyone that has contributed to this repo.

The nextflow workflow was built to work on linux-based operating systems. Additional config options are needed for cloud batch usage.

The library preparation method greatly impacts which bioinformatic tools are recommended for creating a consensus sequence. For example, amplicon-based library prepation methods will required primer trimming and an elevated minimum depth for base-calling. Some bait-derived library prepation methods have a PCR amplification step, and PCR duplicates will need to be removed. This has added complexity and several (admittedly confusing) options to this workflow. Please submit an issue if/when you run into issues.

It is possible to use this workflow to simply annotate fastas generated from any workflow or downloaded from GISAID or NCBI. There are also options for multiple sequence alignment (MSA) and phylogenetic tree creation from the fasta files.

Cecret is also part of the staphb-toolkit.

NF-CORE style docs cat be found in docs WIKI can be found at https://github.com/UPHL-BioNGS/Cecret/wiki

Dependencies

Usage

$ nextflow run UPHL-BioNGS/Grandeur --help

 N E X T F L O W   ~  version 25.10.0

Launching `./main.nf` [special_lovelace] DSL2 - revision: 944ec1d4ed
            
Typical pipeline command:

  nextflow run UPHL-BioNGS/Cecret -profile docker --sample_sheet samplesheet.csv --outdir cecret


Input/output options
  --sample_sheet                    [string] sample sheet with sample, fastq_1, and fastq_2 columns 
  --species                         [string] specifies species-specific sub-workflows [default: sarscov2] 
  --kraken2_db                      [string] directory to kraken2 database 
  --outdir                          [string] The output directory where the results will be saved. Absolute paths are required on cloud infrastructure. [default: cecret] 

Reference files
  --reference_genome                [string] THE Reference genome 
  --amplicon_bed                    [string] Bedfile for amplicons 
  --gff                             [string] File used in ivar variants. Must correspond with reference genome. 
  --primer_bed                      [string] File with bedfile of primers used in the analysis 
  --primer_set                      [string] Specifies a primer set included in repo  (accepted: midnight_idt_V1, midnight_ont_V1, midnight_ont_V2, midnight_ont_V3, ncov_V3, ncov_V4, ncov_V4.1, 
ncov_V5.3.2, mpx_primalseq, mpx_idt, mpx_yale) [default: ncov_V5.3.2]  

Workflow Components
  --download_nextclade_dataset      [boolean] Uses included nextclade dataset for SARS-CoV-2 during runtime when false. [default: true] 
  --predownloaded_nextclade_dataset [string]  Path to predownloaded nextclade dataset 
  --filter                          [boolean] Specifies if reference-mapped fastq files should be extracted 
  --markdup                         [boolean] Specifies if duplicate reads should be removed (not recommended for nanopore) 
  --relatedness                     [boolean] Turns on multiple sequence alignment subworkflow when true 

------------------------------------------------------

Cecret can also use a sample sheet for input with the sample name and reads separated by commas. The header must be sample,fastq_1,fastq_2. The general rule is the identifier for the file(s), the file locations, and the type if not paired-end fastq files.

Rows match files with their processing needs.

  • paired-end reads: sample,read1.fastq.gz,read2.fastq.gz
  • single-reads reads: sample,sample.fastq.gz,single
  • nanopore reads : sample,sample.fastq.gz,ont
  • fasta files: sample,sample.fasta,fasta
  • multifasta files: multifasta,multifasta.fasta,multifasta

Example sample sheet:

sample,fastq_1,fastq_2
SRR13957125,/home/eriny/sandbox/test_files/cecret/reads/SRR13957125_1.fastq.gz,/home/eriny/sandbox/test_files/cecret/reads/SRR13957125_2.fastq.gz
SRR13957170,/home/eriny/sandbox/test_files/cecret/reads/SRR13957170_1.fastq.gz,/home/eriny/sandbox/test_files/cecret/reads/SRR13957170_2.fastq.gz
SRR13957177S,/home/eriny/sandbox/test_files/cecret/single_reads/SRR13957177_1.fastq.gz,single
OQ255990.1,/home/eriny/sandbox/test_files/cecret/fastas/OQ255990.1.fasta,fasta
SRR22452244,/home/eriny/sandbox/test_files/cecret/nanopore/SRR22452244.fastq.gz,ont
# using docker on samples specified in SampleSheet.csv
nextflow run UPHL-BioNGS/Cecret -profile docker --sample_sheet SampleSheet.csv

# using a config file containing all inputs
nextflow run UPHL-BioNGS/Cecret -c file.config

Results are roughly organiized into 'params.outdir'/< analysis >/sample.result

A file summarizing all results is found in 'params.outdir'/cecret_results.csv and 'params.outdir'/cecret_results.txt.

Consensus sequences can be found in 'params.outdir'/consensus and end with *.consensus.fa.

Full workflow

alt text

Updating Cecret

nextflow pull UPHL-BioNGS/Cecret

Cecre

Related Skills

View on GitHub
GitHub Stars60
CategoryDevelopment
Updated14d ago
Forks29

Languages

Nextflow

Security Score

100/100

Audited on Mar 16, 2026

No findings