PoreCov
SARS-CoV-2 workflow for nanopore sequence data
Install / Use
/learn @replikation/PoreCovREADME
poreCov | SARS-CoV-2 Workflow for nanopore sequencing data
Citation:
poreCov - an easy to use, fast, and robust workflow for SARS-CoV-2 genome reconstruction via nanopore sequencing
Christian Brandt, Sebastian Krautwurst, Riccardo Spott, Mara Lohde, Mateusz Jundzill, Mike Marquet, Martin Hölzer
https://www.frontiersin.org/articles/10.3389/fgene.2021.711437/full
What is this Repo?
- poreCov is a SARS-CoV-2 analysis workflow for nanopore data (via the ARTIC protocol) or SARS-CoV-2 genomes (fasta)
- the workflow is pre-configured to simplify data analysis:
Table of Contents
<!--ts-->- poreCov | SARS-CoV-2 Workflow for nanopore sequencing data
- Table of Contents
- 1. Quick Setup (Ubuntu)
- 2. Run poreCov
- 3. Quality Metrics (default)
- 4. Workflow
- 5. Literature / References to cite
- 6. Troubleshooting
- 7. Time to results
- 8. Credits
1. Quick Setup (Ubuntu)
1.1 Nextflow (the workflow manager)
- poreCov needs Nextflow and java run time (default-jre)
- install java run time via:
sudo apt install -y default-jre - install Nextflow via:
curl -s https://get.nextflow.io | bash && sudo mv nextflow /bin && sudo chmod 770 /bin/nextflow
- install java run time via:
1.2 Container (choose one - they manage all the tools)
Docker
- installation here (recommended), alternatively via:
sudo apt install -y docker - add Docker to the user:
sudo usermod -a -G docker $USER
Singularity
- Singularity installation here
- if you can't use Docker
Note, that with Singularity the following environment variables are automatically passed to the container to ensure execution on HPCs: HTTPS_PROXY, HTTP_PROXY, http_proxy, https_proxy, FTP_PROXY and ftp_proxy.
Conda (not recommended)
- Conda installation here
- install Nextflow and Singularity via conda (not cluster compatible) - and use the
singularityprofile
1.3 Basecalling (optional)
- only important if you want to do basecalling via GPU with the workflow:
- local guppy installation (see oxford nanopore installation guide)
- or: install nvidia Docker tool kit
- or: Singularity (with --nv support)
2. Run poreCov
2.1 Test run
- validate your installation via test data:
# for a Docker installation
nextflow run replikation/poreCov -profile test_fastq,local,docker -r 1.1.0 --update
# or for Singularity or conda installation
nextflow run replikation/poreCov -profile test_fastq,local,singularity -r 1.1.0 --update
2.2 Quick run examples
- poreCov with basecalling and Docker
--updatetryies to force the most recent pangolin lineage and nextclade release version (optional)-r 1.1.0specifies the workflow release from here--primerVspecifies the primer sets that were used, see--helpto see what is supported- alternatively provide a primer bed file on your own
nextflow run replikation/poreCov --fast5 fast5/ -r 1.1.0 \
--cores 6 -profile local,docker --update --primerV V4
- poreCov with a basecalled fastq directory and custom primer bed file
nextflow run replikation/poreCov --fastq_pass 'fastq_pass/' -r 1.1.0 \
--cores 32 -profile local,docker --update --primerV primers.bed
- poreCov with basecalling and renaming of barcodes based on
sample_names.csv
# rename barcodes automatically by providing an input file, also using another primer scheme
nextflow run replikation/poreCov --fast5 fast5_dir/ --samples sample_names.csv \
--primerV V1200 --output results -profile local,docker --update
2.3 Extended Usage
- see also
nextflow run replikation/poreCov --help -r 1.1.0
Version control
- poreCov supports version control via
-rthis way, you can run everything reproducible (e.g.-r 1.1.0)- moreover only releases are extensively tested and validated
- poreCov releases are listed here
- add
-r <version>to a poreCoV run to activate this - run
nextflow pull replikation/poreCovto install updates- if you have issues during update try
rm -rf ~/.nextflowand thennextflow pull replikation/poreCov - this removes old files and downloads everything new
- if you have issues during update try
Important input flags (choose one)
- these are the flags to get "data" into the workflow
--fast5 fast5_dir/for fast5 directory input--fastq_pass fastq_dir/directory with basecalled data (contains "barcode01" etc. directories)--fastq "sample*.fastq.gz"alternative fastq input (one sample per file)--fasta "*genomes.fasta"SARS-CoV-2 genomes as fasta (.gz allowed)
Custom primer bed files
- poreCov supports the input of
primer.bedfiles via--primerVinstead of selecting a preexisting primer version like--primerV V4- for an example see 2.2 Quick run examples
- feature available for poreCov version
1.1.0or greater
- the main issue with primer bed files is that they need to have the correct columns and text to be recognized via artic
- the following rules apply to the bed file (see also example)
- each column is separated via one
tabor\t - column 1 is the fasta reference, and it should be MN908947.3 (poreCov replaces that automatically)
- column 2 is the primer start
- column 3 is the primer end
- column 4 is the primer name, and it has to end with
_RIGHTor_LEFT - column 5 is the pool and it should be named
nCoV-2019_1ornCoV-2019_2 - column 6 defines the strand orientation with either
-or+
- each column is separated via one
MN908947.3 30 54 nCoV-2019_1_LEFT nCoV-2019_1 +
MN908947.3 1183 1205 nCoV-2019_1_RIGHT nCoV-2019_1 -
MN908947.3 1100 1128 nCoV-2019_2_LEFT nCoV-2019_2 +
MN908947.3 2244 2266 nCoV-2019_2_RIGHT nCoV-2019_2 -
MN908947.3 2153 2179 nCoV-2019_3_LEFT nCoV-2019_1 +
MN908947.3 3235 3257 nCoV-2019_3_RIGHT nCoV-2019_1 -
MN908947.3 3144 3166 nCoV-2019_4_LEFT nCoV-2019_2 +
MN908947.3 4240 4262 nCoV-2019_4_RIGHT nCoV-2019_2 -
Sample input
[!NOTE]
If using --fastq without either --sample or --list, samples whose concatenated and size-selected FastQ files are smaller than 1500 kB will be excluded from further analysis.
Sample sheet
- barcodes can be automatically renamed via
--samples sample_names.csv - required columns:
_id= sample nameStatus= barcode number which should be renamed
- optional column:
Description= description column to be included in the output report and tables
Example comma separated file (don't replace the header):
_id,Status,Description
Sample_2021,barcode01,good
2ndSample,BC02,bad
List input
- You can provide a csv as input to
--fastqto select for specific fastq-files- e.g.:
--fastq input.csv - the csv needs to contain two columns:
- column 1 = sample name
- column 2 = path to fastq-location
- no header should be used
- e.g.:
- files get automatically renamed to the sample names provided in column 1
- after read length filtering, the file size selection for poor samples is disabled -> all samples will appear in the report
Example:
sample1,path/to/first/sample.fastq.gz
2ndSample,path/to/second/sample.fastq.gz
Pangolin Lineage definitions
- lineage determinations are quickly changing in response to the pandemic
- to avoid using out of date lineage schemes, a
--updateflag can be added to each
Related Skills
node-connect
342.0kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
84.7kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
342.0kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
commit-push-pr
84.7kCommit, push, and open a PR
