SkillAgentSearch skills...

PoreCov

SARS-CoV-2 workflow for nanopore sequence data

Install / Use

/learn @replikation/PoreCov
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

<p align="center"> <img src="data/logo/mobile_logo.png" width="800" title="Workflow"> </p>

poreCov | SARS-CoV-2 Workflow for nanopore sequencing data

Twitter Follow

Citation:

poreCov - an easy to use, fast, and robust workflow for SARS-CoV-2 genome reconstruction via nanopore sequencing
Christian Brandt, Sebastian Krautwurst, Riccardo Spott, Mara Lohde, Mateusz Jundzill, Mike Marquet, Martin Hölzer
https://www.frontiersin.org/articles/10.3389/fgene.2021.711437/full

What is this Repo?

  • poreCov is a SARS-CoV-2 analysis workflow for nanopore data (via the ARTIC protocol) or SARS-CoV-2 genomes (fasta)
  • the workflow is pre-configured to simplify data analysis:
<p align="left"> <a href="https://htmlpreview.github.io/?https://github.com/replikation/poreCov/blob/master/data/figures/index.html"> <img src="data/figures/report_summary.png" width="500" title="Report file"> </p>

Table of Contents

<!--ts--> <!--te-->

1. Quick Setup (Ubuntu)

1.1 Nextflow (the workflow manager)

  • poreCov needs Nextflow and java run time (default-jre)
    • install java run time via: sudo apt install -y default-jre
    • install Nextflow via: curl -s https://get.nextflow.io | bash && sudo mv nextflow /bin && sudo chmod 770 /bin/nextflow

1.2 Container (choose one - they manage all the tools)

Docker

  • installation here (recommended), alternatively via: sudo apt install -y docker
  • add Docker to the user: sudo usermod -a -G docker $USER

Singularity

  • Singularity installation here
  • if you can't use Docker

Note, that with Singularity the following environment variables are automatically passed to the container to ensure execution on HPCs: HTTPS_PROXY, HTTP_PROXY, http_proxy, https_proxy, FTP_PROXY and ftp_proxy.

Conda (not recommended)

  • Conda installation here
  • install Nextflow and Singularity via conda (not cluster compatible) - and use the singularity profile

1.3 Basecalling (optional)

  • only important if you want to do basecalling via GPU with the workflow:
    • local guppy installation (see oxford nanopore installation guide)
    • or: install nvidia Docker tool kit
    • or: Singularity (with --nv support)

2. Run poreCov

2.1 Test run

  • validate your installation via test data:
# for a Docker installation
nextflow run replikation/poreCov -profile test_fastq,local,docker -r 1.1.0 --update

# or for Singularity or conda installation
nextflow run replikation/poreCov -profile test_fastq,local,singularity -r 1.1.0 --update

2.2 Quick run examples

  • poreCov with basecalling and Docker
    • --update tryies to force the most recent pangolin lineage and nextclade release version (optional)
    • -r 1.1.0 specifies the workflow release from here
    • --primerV specifies the primer sets that were used, see --help to see what is supported
      • alternatively provide a primer bed file on your own
nextflow run replikation/poreCov --fast5 fast5/ -r 1.1.0 \
    --cores 6 -profile local,docker --update --primerV V4
  • poreCov with a basecalled fastq directory and custom primer bed file
nextflow run replikation/poreCov --fastq_pass 'fastq_pass/' -r 1.1.0 \
    --cores 32  -profile local,docker --update --primerV primers.bed
  • poreCov with basecalling and renaming of barcodes based on sample_names.csv
# rename barcodes automatically by providing an input file, also using another primer scheme
nextflow run replikation/poreCov --fast5 fast5_dir/ --samples sample_names.csv \
   --primerV V1200 --output results -profile local,docker --update

2.3 Extended Usage

  • see also nextflow run replikation/poreCov --help -r 1.1.0

Version control

  • poreCov supports version control via -r this way, you can run everything reproducible (e.g. -r 1.1.0)
    • moreover only releases are extensively tested and validated
  • poreCov releases are listed here
  • add -r <version> to a poreCoV run to activate this
  • run nextflow pull replikation/poreCov to install updates
    • if you have issues during update try rm -rf ~/.nextflow and then nextflow pull replikation/poreCov
    • this removes old files and downloads everything new

Important input flags (choose one)

  • these are the flags to get "data" into the workflow
    • --fast5 fast5_dir/ for fast5 directory input
    • --fastq_pass fastq_dir/ directory with basecalled data (contains "barcode01" etc. directories)
    • --fastq "sample*.fastq.gz" alternative fastq input (one sample per file)
    • --fasta "*genomes.fasta" SARS-CoV-2 genomes as fasta (.gz allowed)

Custom primer bed files

  • poreCov supports the input of primer.bed files via --primerV instead of selecting a preexisting primer version like --primerV V4
  • the main issue with primer bed files is that they need to have the correct columns and text to be recognized via artic
  • the following rules apply to the bed file (see also example)
    • each column is separated via one tab or \t
    • column 1 is the fasta reference, and it should be MN908947.3 (poreCov replaces that automatically)
    • column 2 is the primer start
    • column 3 is the primer end
    • column 4 is the primer name, and it has to end with _RIGHT or _LEFT
    • column 5 is the pool and it should be named nCoV-2019_1 or nCoV-2019_2
    • column 6 defines the strand orientation with either - or +
MN908947.3	30	54	nCoV-2019_1_LEFT	nCoV-2019_1	+
MN908947.3	1183	1205	nCoV-2019_1_RIGHT	nCoV-2019_1	-
MN908947.3	1100	1128	nCoV-2019_2_LEFT	nCoV-2019_2	+
MN908947.3	2244	2266	nCoV-2019_2_RIGHT	nCoV-2019_2	-
MN908947.3	2153	2179	nCoV-2019_3_LEFT	nCoV-2019_1	+
MN908947.3	3235	3257	nCoV-2019_3_RIGHT	nCoV-2019_1	-
MN908947.3	3144	3166	nCoV-2019_4_LEFT	nCoV-2019_2	+
MN908947.3	4240	4262	nCoV-2019_4_RIGHT	nCoV-2019_2	-

Sample input

[!NOTE]
If using --fastq without either --sample or --list, samples whose concatenated and size-selected FastQ files are smaller than 1500 kB will be excluded from further analysis.

Sample sheet

  • barcodes can be automatically renamed via --samples sample_names.csv
  • required columns:
    • _id = sample name
    • Status = barcode number which should be renamed
  • optional column:
    • Description = description column to be included in the output report and tables

Example comma separated file (don't replace the header):

_id,Status,Description
Sample_2021,barcode01,good
2ndSample,BC02,bad

List input

  • You can provide a csv as input to --fastq to select for specific fastq-files
    • e.g.: --fastq input.csv
    • the csv needs to contain two columns:
      • column 1 = sample name
      • column 2 = path to fastq-location
    • no header should be used
  • files get automatically renamed to the sample names provided in column 1
  • after read length filtering, the file size selection for poor samples is disabled -> all samples will appear in the report

Example:

sample1,path/to/first/sample.fastq.gz
2ndSample,path/to/second/sample.fastq.gz

Pangolin Lineage definitions

  • lineage determinations are quickly changing in response to the pandemic
  • to avoid using out of date lineage schemes, a --update flag can be added to each

Related Skills

View on GitHub
GitHub Stars42
CategoryDevelopment
Updated3mo ago
Forks18

Languages

Nextflow

Security Score

92/100

Audited on Dec 10, 2025

No findings