Q2ONT
Bash pipeline for analysis of ONT full-length 16S sequences in QIIME2
Install / Use
/learn @DeniRibicic/Q2ONTREADME
q2ONT
Bash pipeline for QIIME2 analysis of ONT generated full-length 16S rRNA sequences.
Getting started
Prerequsites
- Miniconda 3
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
chmod u+x Miniconda3-latest-Linux-x86_64.sh
./Miniconda3-latest-Linux-x86_64.sh
conda update conda
- QIIME2
wget https://data.qiime2.org/distro/core/qiime2-2019.10-py36-linux-conda.yml
conda env create -n qiime2-2019.7 --file qiime2-2019.10-py36-linux-conda.yml
# OPTIONAL CLEANUP
rm qiime2-2019.7-py36-linux-conda.yml
- Trimmomatic
conda install -c bioconda trimmomatic
- Porechop
git clone https://github.com/rrwick/Porechop.git
cd Porechop
python3 setup.py install
porechop -h
Downloads
-
clone Git repository or just download q2ONT.sh script
-
Latest Silva q2 compatible db, i.e v.132
wget https://www.arb-silva.de/download/archive/qiime/SILVA_132_release.zip
How to use
-
After cloning
q2ONT.shbash script, place it somewhere in your path so it will be accessible, e.g./home/user/bin. Thereafter make it executable by runningchmod u+x q2ONT.sh.This script assumes that you have basecalled ONT generated
.fast5files and have all corresponding.fastqfiles placed in a direcotry, e.g0_basecalled-fastq. -
If not already done, you need to convert Silva representative sequnces (e.g. 99_otus.fasta) and corresponding taxonomy into q2 artefact.
You can do this as follows:
qiime tools import \ --input-path Silva_132_release/rep_set/rep_set_all/99/99_otus.fasta \ --output-path Silva_132_release/rep_set/rep_set_all/99/99_otus.qza --type 'FeatureData[Sequence]'and
qiime tools import \ --input-path Silva_132_release/taxonomy/taxonomy_all/99/consensus_taxonomy_7_levels.txt \ --output-path Silva_132_release/taxonomy/taxonomy_all/99/consensus_taxonomy_7_levels.qza \ --type 'FeatureData[Taxonomy]' \ --input-format HeaderlessTSVTaxonomyFormatAlso, you will need to train classifier which will be used during taxonomical annotation.
You can do this as follows:
qiime feature-classifier fit-classifier-naive-bayes \ --i-reference-reads Silva_132_release/rep_set/rep_set_all/99/99_otus.qza \ --i-reference-taxonomy Silva_132_release/taxonomy/taxonomy_all/99/consensus_taxonomy_7_levels.qza \ --o-classifier Silva_132_release/rep_set/rep_set_all/99/classifier.qzaNow having everything in place go one direcory up from where your
.fastqfiles are located. i.e, if you typelsit should list directory containing your.fastqfiles. e.g.ls 0_basecalled-fastq -
Run the q2ONT.sh script:
q2ONT.sh [-i fastq_directory] [-j reference_seqs] [-c classifier] [-t threads]where:
[-i fastq_directory]is the directory containing all of your basecalled.fastqfiles; e.g0_basecalled-fastq[-j reference_seqs]is reference sequence q2 artefact; e.g99_otus.qza. Use complete path to it.[-c classifier]is q2 pretrained classifier; e.g.classifier.qza. Use complete path to it.[-t threads]is the number of CPU threads you want to use -
After a successful run, direcotry
exportedshould be generated. This directory will contain files in native format (not q2 artefact) all ready to be loaded in Phyloseq package for further data exploration;- biom file with added taxonomy (
table-with-taxonomy.biom) - newick tree file (
tree.nwk) and - representative sequences file (
dna-sequences.fasta)
- biom file with added taxonomy (
Happy QIIMEing!!
What does script do?
The file output from the workflow can be traced based on file indexing (prefix numbers added as the files are generated).
-
First step includes concatenating all
.fastqfiles into one file. -
porechopis employed to demultiplex your reads. -
trimmomaticwill discard all reads shorter than 1400 bp. All reads longer than 1400 bp will be cropped to that length. These settings can be changed if desired, you will just have to modify it manually beofre running the script. You want to look atMINLEN:1400 CROP:1400parameter. -
QIIME2 imports now reads creating
4_single-end-demux.qzaartefact together with visualization file4_single-end-demux.qzvwhich can be viewed in QIIME2View online tool. -
q2 will dereplicate sequences and create also visualization files for each.
-
Chimeric sequences are screened for and are filtered out from the workflow. Subsequently OTUs are clustered via open reference option using
vsearchat 85% identity. This can be also changed manually digging into script. However, due to high error rate of ONT platform, it is advised to cluster OTUs at 85% similarity or even less*. -
Reads are aligned using
mafft -
Alignments are masked.
-
Un-rooted tree is created.
-
Rooted tree is created.
-
Taxonomy is assigned using pre-trained SILVA classifier
*.Curren, Emily, et al. "Rapid profiling of tropical marine cyanobacterial communities." Regional Studies in Marine Science 25 (2019): 100485.
