Cecret
Reference-based consensus creation
Install / Use
/learn @UPHL-BioNGS/CecretREADME
Named after the beautiful Cecret lake
Location: 40.570°N 111.622°W , Elevation: 9,875 feet (3,010 m), Hiking level: easy
<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/c/cb/Cecret_Lake_Panorama_Albion_Basin_Alta_Utah_July_2009.jpg/3840px-Cecret_Lake_Panorama_Albion_Basin_Alta_Utah_July_2009.jpg" width="500"/>
Image credit: Jeffrey McGrath posted on the Wikipedia Article
Table of Contents:
- Introduction
- Dependencies
- Usage
- Input and output directories
- Quality Assessment
- Setting primer and amplicon bedfiles
- Using a predownloaded nextclade dataset
- Setting depth for base calls
- SARS-CoV-2 Wastewater
- Monkeypox
- Updating Cecret
- Optional toggles
- Determining relatedness or creating trees
- Classified reads with Kraken2
- Main components
- Turning off unneeded processes
- Final file structure
- Config files
- Frequently Asked Questions (aka FAQ)
Introduction
Cecret was originally developed by @erinyoung at the Utah Public Health Laborotory for SARS-COV-2 sequencing with the artic/Illumina hybrid library prep workflow for MiSeq data with protocols here and here. This nextflow workflow, however, is flexible for many additional organisms and primer schemes as long as the reference genome is "small" and "good enough." In 2022, @tives82 added in contributions for Monkeypox virus, including converting IDT's primer scheme to NC_063383.1 coordinates. We are grateful to everyone that has contributed to this repo.
The nextflow workflow was built to work on linux-based operating systems. Additional config options are needed for cloud batch usage.
The library preparation method greatly impacts which bioinformatic tools are recommended for creating a consensus sequence. For example, amplicon-based library prepation methods will required primer trimming and an elevated minimum depth for base-calling. Some bait-derived library prepation methods have a PCR amplification step, and PCR duplicates will need to be removed. This has added complexity and several (admittedly confusing) options to this workflow. Please submit an issue if/when you run into issues.
It is possible to use this workflow to simply annotate fastas generated from any workflow or downloaded from GISAID or NCBI. There are also options for multiple sequence alignment (MSA) and phylogenetic tree creation from the fasta files.
Cecret is also part of the staphb-toolkit.
NF-CORE style docs cat be found in docs WIKI can be found at https://github.com/UPHL-BioNGS/Cecret/wiki
Dependencies
- Nextflow
- Singularity or Docker - set the profile as singularity or docker during runtime
Usage
$ nextflow run UPHL-BioNGS/Grandeur --help
N E X T F L O W ~ version 25.10.0
Launching `./main.nf` [special_lovelace] DSL2 - revision: 944ec1d4ed
Typical pipeline command:
nextflow run UPHL-BioNGS/Cecret -profile docker --sample_sheet samplesheet.csv --outdir cecret
Input/output options
--sample_sheet [string] sample sheet with sample, fastq_1, and fastq_2 columns
--species [string] specifies species-specific sub-workflows [default: sarscov2]
--kraken2_db [string] directory to kraken2 database
--outdir [string] The output directory where the results will be saved. Absolute paths are required on cloud infrastructure. [default: cecret]
Reference files
--reference_genome [string] THE Reference genome
--amplicon_bed [string] Bedfile for amplicons
--gff [string] File used in ivar variants. Must correspond with reference genome.
--primer_bed [string] File with bedfile of primers used in the analysis
--primer_set [string] Specifies a primer set included in repo (accepted: midnight_idt_V1, midnight_ont_V1, midnight_ont_V2, midnight_ont_V3, ncov_V3, ncov_V4, ncov_V4.1,
ncov_V5.3.2, mpx_primalseq, mpx_idt, mpx_yale) [default: ncov_V5.3.2]
Workflow Components
--download_nextclade_dataset [boolean] Uses included nextclade dataset for SARS-CoV-2 during runtime when false. [default: true]
--predownloaded_nextclade_dataset [string] Path to predownloaded nextclade dataset
--filter [boolean] Specifies if reference-mapped fastq files should be extracted
--markdup [boolean] Specifies if duplicate reads should be removed (not recommended for nanopore)
--relatedness [boolean] Turns on multiple sequence alignment subworkflow when true
------------------------------------------------------
Cecret can also use a sample sheet for input with the sample name and reads separated by commas. The header must be sample,fastq_1,fastq_2. The general rule is the identifier for the file(s), the file locations, and the type if not paired-end fastq files.
Rows match files with their processing needs.
- paired-end reads:
sample,read1.fastq.gz,read2.fastq.gz - single-reads reads:
sample,sample.fastq.gz,single - nanopore reads :
sample,sample.fastq.gz,ont - fasta files:
sample,sample.fasta,fasta - multifasta files:
multifasta,multifasta.fasta,multifasta
Example sample sheet:
sample,fastq_1,fastq_2
SRR13957125,/home/eriny/sandbox/test_files/cecret/reads/SRR13957125_1.fastq.gz,/home/eriny/sandbox/test_files/cecret/reads/SRR13957125_2.fastq.gz
SRR13957170,/home/eriny/sandbox/test_files/cecret/reads/SRR13957170_1.fastq.gz,/home/eriny/sandbox/test_files/cecret/reads/SRR13957170_2.fastq.gz
SRR13957177S,/home/eriny/sandbox/test_files/cecret/single_reads/SRR13957177_1.fastq.gz,single
OQ255990.1,/home/eriny/sandbox/test_files/cecret/fastas/OQ255990.1.fasta,fasta
SRR22452244,/home/eriny/sandbox/test_files/cecret/nanopore/SRR22452244.fastq.gz,ont
# using docker on samples specified in SampleSheet.csv
nextflow run UPHL-BioNGS/Cecret -profile docker --sample_sheet SampleSheet.csv
# using a config file containing all inputs
nextflow run UPHL-BioNGS/Cecret -c file.config
Results are roughly organiized into 'params.outdir'/< analysis >/sample.result
A file summarizing all results is found in 'params.outdir'/cecret_results.csv and 'params.outdir'/cecret_results.txt.
Consensus sequences can be found in 'params.outdir'/consensus and end with *.consensus.fa.
Full workflow

Updating Cecret
nextflow pull UPHL-BioNGS/Cecret
Cecre
Related Skills
node-connect
342.5kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
85.3kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
342.5kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
342.5kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
