Cecret

Reference-based consensus creation

Generate Convert Improve

Install / Use

/learn @UPHL-BioNGS/Cecret

About this skill

Quality Score

0/100

README

<div align="center"> <img src="images/logo.png" alt="Cecret Logo" width="220"> <h1>Cecret</h1> <p> <strong>Reference-based consensus creation.</strong> </p>

</div> <br> **A Nextflow workflow for reference-based amplicon consensus generation.**

Named after the beautiful Cecret lake

Location: 40.570°N 111.622°W , Elevation: 9,875 feet (3,010 m), Hiking level: easy

Image credit: Jeffrey McGrath posted on the Wikipedia Article

Table of Contents:

Introduction

Cecret was originally developed by @erinyoung at the Utah Public Health Laborotory for SARS-COV-2 sequencing with the artic/Illumina hybrid library prep workflow for MiSeq data with protocols here and here. This nextflow workflow, however, is flexible for many additional organisms and primer schemes as long as the reference genome is "small" and "good enough." In 2022, @tives82 added in contributions for Monkeypox virus, including converting IDT's primer scheme to NC_063383.1 coordinates. We are grateful to everyone that has contributed to this repo.

The nextflow workflow was built to work on linux-based operating systems. Additional config options are needed for cloud batch usage.

The library preparation method greatly impacts which bioinformatic tools are recommended for creating a consensus sequence. For example, amplicon-based library prepation methods will required primer trimming and an elevated minimum depth for base-calling. Some bait-derived library prepation methods have a PCR amplification step, and PCR duplicates will need to be removed. This has added complexity and several (admittedly confusing) options to this workflow. Please submit an issue if/when you run into issues.

It is possible to use this workflow to simply annotate fastas generated from any workflow or downloaded from GISAID or NCBI. There are also options for multiple sequence alignment (MSA) and phylogenetic tree creation from the fasta files.

Cecret is also part of the staphb-toolkit.

NF-CORE style docs cat be found in docs WIKI can be found at https://github.com/UPHL-BioNGS/Cecret/wiki

Dependencies

Nextflow
Singularity or Docker - set the profile as singularity or docker during runtime

Usage

$ nextflow run UPHL-BioNGS/Grandeur --help

 N E X T F L O W   ~  version 25.10.0

Launching `./main.nf` [special_lovelace] DSL2 - revision: 944ec1d4ed
            
Typical pipeline command:

  nextflow run UPHL-BioNGS/Cecret -profile docker --sample_sheet samplesheet.csv --outdir cecret


Input/output options
  --sample_sheet                    [string] sample sheet with sample, fastq_1, and fastq_2 columns 
  --species                         [string] specifies species-specific sub-workflows [default: sarscov2] 
  --kraken2_db                      [string] directory to kraken2 database 
  --outdir                          [string] The output directory where the results will be saved. Absolute paths are required on cloud infrastructure. [default: cecret] 

Reference files
  --reference_genome                [string] THE Reference genome 
  --amplicon_bed                    [string] Bedfile for amplicons 
  --gff                             [string] File used in ivar variants. Must correspond with reference genome. 
  --primer_bed                      [string] File with bedfile of primers used in the analysis 
  --primer_set                      [string] Specifies a primer set included in repo  (accepted: midnight_idt_V1, midnight_ont_V1, midnight_ont_V2, midnight_ont_V3, ncov_V3, ncov_V4, ncov_V4.1, 
ncov_V5.3.2, mpx_primalseq, mpx_idt, mpx_yale) [default: ncov_V5.3.2]  

Workflow Components
  --download_nextclade_dataset      [boolean] Uses included nextclade dataset for SARS-CoV-2 during runtime when false. [default: true] 
  --predownloaded_nextclade_dataset [string]  Path to predownloaded nextclade dataset 
  --filter                          [boolean] Specifies if reference-mapped fastq files should be extracted 
  --markdup                         [boolean] Specifies if duplicate reads should be removed (not recommended for nanopore) 
  --relatedness                     [boolean] Turns on multiple sequence alignment subworkflow when true 

------------------------------------------------------

Cecret can also use a sample sheet for input with the sample name and reads separated by commas. The header must be sample,fastq_1,fastq_2. The general rule is the identifier for the file(s), the file locations, and the type if not paired-end fastq files.

Rows match files with their processing needs.

paired-end reads: sample,read1.fastq.gz,read2.fastq.gz
single-reads reads: sample,sample.fastq.gz,single
nanopore reads : sample,sample.fastq.gz,ont
fasta files: sample,sample.fasta,fasta
multifasta files: multifasta,multifasta.fasta,multifasta

Example sample sheet:

sample,fastq_1,fastq_2
SRR13957125,/home/eriny/sandbox/test_files/cecret/reads/SRR13957125_1.fastq.gz,/home/eriny/sandbox/test_files/cecret/reads/SRR13957125_2.fastq.gz
SRR13957170,/home/eriny/sandbox/test_files/cecret/reads/SRR13957170_1.fastq.gz,/home/eriny/sandbox/test_files/cecret/reads/SRR13957170_2.fastq.gz
SRR13957177S,/home/eriny/sandbox/test_files/cecret/single_reads/SRR13957177_1.fastq.gz,single
OQ255990.1,/home/eriny/sandbox/test_files/cecret/fastas/OQ255990.1.fasta,fasta
SRR22452244,/home/eriny/sandbox/test_files/cecret/nanopore/SRR22452244.fastq.gz,ont

# using docker on samples specified in SampleSheet.csv
nextflow run UPHL-BioNGS/Cecret -profile docker --sample_sheet SampleSheet.csv

# using a config file containing all inputs
nextflow run UPHL-BioNGS/Cecret -c file.config

Results are roughly organiized into 'params.outdir'/< analysis >/sample.result

A file summarizing all results is found in 'params.outdir'/cecret_results.csv and 'params.outdir'/cecret_results.txt.

Consensus sequences can be found in 'params.outdir'/consensus and end with *.consensus.fa.

Full workflow

alt text

Updating Cecret

nextflow pull UPHL-BioNGS/Cecret

Cecre

Related Skills

node-connect

342.5k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

85.3k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

342.5k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

qqbot-media

342.5k

QQBot 富媒体收发能力。使用 <qqmedia> 标签，系统根据文件扩展名自动识别类型（图片/语音/视频/文件）。