Nanogo
A comprehensive bioinformatics pipeline for Oxford Nanopore Technologies. The tool offers Dorado-based basecalling with duplex options, amplicon read assembly and polishing, primer trimming, and quality control. Designed for efficient parallel processing and enhanced by interactive input and output configuration.
Install / Use
/learn @phac-nml/NanogoREADME
Overview
NanoGO Pipeline: A comprehensive bioinformatics pipeline for Oxford Nanopore Technologies. The tool offers Dorado-based basecalling with duplex options, amplicon read assembly and polishing, primer trimming, and quality control.
Designed for efficient parallel processing and enhanced by interactive input and output configuration, NanoGO maximizes computational resources while providing flexible control to users.
Quick Installation Guide
Use the following command to install NanoGO:
conda create -n nanogo-0.3.7 "python=3.10" -y && conda activate nanogo-0.3.7 && conda install -c gosahan nanogo -y
<p align="center">
<a href="https://github.com/phac-nml/nanogo/main/extra/cmd_j3T8dUwunM.gif"><img src="https://raw.githubusercontent.com/phac-nml/nanogo/main/extra/cmd_j3T8dUwunM.gif" alt="NanoGO Install Instructions" height="auto" width="1000"/></a>
</p>
If quick installation does not work then follow the instructions below
- <env_name> specify you conda environment name in which to install NanoGO. For example, <env_name> can be
conda create -n nanogo-0.3.7 "python=3.10"
conda create -n <env_name> "python=3.10"
conda activate <env_name>
conda install -c gosahan nanogo
Usage Requirements
NanoGO Analysis
Input Directory Structure: Unique Reference and Primer/Adaptor Sequences for Each Sample
Input Directory
├─ Sample_A
│ ├─ Sample_A_01.fastq.gz (fastq.gz, fastq)
│ ├─ Sample_A_02.fastq (fastq.gz, fastq)
│ ├─ Reference_Sequence.fasta (fasta)
│ └─ Primer_File.txt (txt)
├─ Sample_B
│ ├─ Sample_B_01.fastq.gz (fastq.gz, fastq)
│ ├─ Sample_B_02.fastq.gz (fastq.gz, fastq)
│ ├─ Sample_B_03.fastq.gz (fastq.gz, fastq)
│ ├─ Reference_Sequence.fasta (fasta)
│ └─ Primer_File.txt (txt)
├─ Sample_N
├─ Sample_N_01.fastq.gz (fastq.gz, fastq)
├─ {reference_sequence_files_name}.fasta (fasta)
└─ {unique_primer/adaptor_files_name}.txt (txt)
Input Directory Structure: Shared Reference and Primer/Adaptor Sequences for All Samples
Input Directory
├─ Sample_A
│ ├─ Sample_A_01.fastq.gz (fastq.gz, fastq)
│ ├─ Sample_A_02.fastq (fastq.gz, fastq)
├─ Sample_B
│ ├─ Sample_B_01.fastq.gz (fastq.gz, fastq)
│ ├─ Sample_B_02.fastq.gz (fastq.gz, fastq)
│ ├─ Sample_B_03.fastq.gz (fastq.gz, fastq)
├─ Sample_N
│ ├─ Sample_N_01.fastq.gz (fastq.gz, fastq)
│
├─ {reference_sequence_files_name}.fasta (fasta)
│
└─ {unique_primer/adaptor_files_name}.txt (txt)
Guidelines for NanoGO Analysis Tool
- Each folder is treated as a unique sample containing read files associated with the same barcode.
- If sequencing different genes and combining datasets for analysis is required, create a folder and move all fastq files intended for joint assembly.
Reference Sequence File Structure
>Reference_sequence
UUCCUACAAGGGAARNGCD
or
>Reference_sequence_1
UUCCUACAAGGGAARNGCD
>Reference_sequence_2
UUCCDUACAAGGGAARNGCD
>Reference_sequence_3
UUCCUACAARHGGGAARNGCD
- If multiple sequences are present in a reference file, the analysis tool concatenates them into one reference.
- This generates a single consensus file; if a separate reference is needed for each sequence, consider running the analysis tool multiple times with only the desired gene.
- The reference sequence can contain the followinbg letters:
WSRYVHBMNKUATGC.
Primer/Adaptor File structure:
>{your_forward_primer_name}-F
AACARMNKAGATAAAATAGTHAGT
>{your_forward_primer_name}-R
ATAATTATGRTGYTMNKTARACT
Primer/Adaptor Naming Requirements:
- Primer names start with
>and should not contain spaces (use_instead). - Primer names must end with
-Ffor forward primers or-Rfor reverse primers, which is critical during the primer trimming stage.
Primer/Adaptor Sequence Requirements:
- Follow the primer name with its sequence.
>forward_primer-F
ATAATTATGRTGYTMNKTARACT
>reverse_primer-R
AACARMNKAGATAAAATAGTHAGT
System Requirements
- Operating System: Linux or Windows Subsystem for Linux (WSL).
- Memory: Minimum of 16 GB RAM.
- Processor: At least 4 cores.
Name's of files and folders should not contain spaces, replace the spaces with _
NanoGO Basecaller
Input Directory Structure
Input Directory
├─ Raw_Sample_A
│ ├─ Raw_Sample_A_01.pod5 (POD5 or FAST5)
│ ├─ Raw_Sample_A_02.pod5 (POD5 or FAST5)
├─ Raw_Sample_B
│ ├─ Raw_Sample_B_01.fast5 (POD5 or FAST5)
│ ├─ Raw_Sample_B_02.fast5 (POD5 or FAST5)
│ ├─ Raw_Sample_B_03.fast5 (POD5 or FAST5)
└─ Raw_Sample_N
└─ Raw_Sample_N_01.pod5 (fastq.gz, fastq)
- The NanoGO Basecaller considers
Raw_Sample_AandRaw_Sample_Bas separate sequencing runs. - Files within each folder can be in either FAST5 or POD5 format.
- FAST5 files are converted to POD5 format during the sequencing process.
System Requirements
- GPU Compatibility: Must be compatible with Dorado.
- Operating System: Linux or Windows Subsystem for Linux (WSL).
- Memory: Minimum of 16 GB RAM.
- Processor: At least 4 cores.
Usage Instructions for nanogo
Test to see if the install is working
Input:
nanogo --help
Output:
usage: nanogo [options] <subcommand>
options:
-h, --help show this help message and exit
-v, --version Display the version number of NanoGO.
Available Tools:
Valid subcommands
{basecaller,analysis}
Description
basecaller Run nanogo basecaller using dorado basecaller
analysis Run nanogo bioinformatics pipeline for generating consensus sequences and qc report
Usage Instructions for nanogo analysis
NanoGO has several modes of operation.
Run nanogo analysis on command line to run pipeline in interactivate mode.
nanogo analysis
Run in command line mode by inputing the required argument.
nanogo analysis -i <fastq_folder_path> -o <output_directory_name> --primer <primer_file.txt> --ref <reference_seq.fasta> [options]
Arguments
Input/Output Options
-i <fastq_folder_path>: Path to the folder containing fastq data. Activates interactive mode if not provided.-o <output_directory>: Output directory name(Default: "nanogo_output").
Main Options
-
--primer <primer_folder_path>: Path to folder containing primer sequences. -
--ref <reference_folder_path>: Path to folder containing reference sequences. -
--model: Prompt to select supported models used for basecalling(Default: r1041_e82_400bps_sup_v4.2.0) -
--min <minimum_read_size>: Minimum size of read expected,(Default: 100) -
--max <maximum_read_size>: Maximum size of read expected,(Default: 1500) -
--overlap <overlap_size>: Minimum overlap of reads. This is used to determine the read trimming sensitivity(Default: 1000) -
--quality <minimum_quality>: Minimum quality of nucleotide at end of reads. This value is used to trim low-quality bases.Default: 20 -
--barcode: Barcodes used for ONT sequencing. Choices: pcr_barcodes, rapid_barcodes, native_barcodes.(Default: native_barcodes) -
--require_two_barcodes: Reads will only be put in barcode bins if they have a strong match for the barcode on both their start and end(Default: false) -
--database <kraken2_database>: If you have a custom Kraken2 database installed locally then provide path to folder containing k2d files.(default: MinusB - Archaea, Viral, Plasmid, Human, UniVec_Core) -
-v,--version: Use this to display the version number of NanoGO.
Example
nanogo analysis -input <basecalled_data_folder> -o <nanogo_output>
This command will process fastq data in the fastq_data directory, output results to nanogo_output, and prompt for the basecalling model.
Output
NanoGO Analysis
Security Score
Audited on Feb 12, 2026
