<table align="center" style="margin: 0px auto;"> <tr> <td> <img src="https://raw.githubusercontent.com/phac-nml/nanogo/main/extra/nanogo_logo.svg" alt="NanoGo Logo" width="400" height="auto"/> </td> <td> <h1>Nanopore Genome Optimization Bioinformatics Pipeline</h1> <a href="https://anaconda.org/gosahan/nanogo"> <img src="https://anaconda.org/gosahan/nanogo/badges/version.svg" alt="NanoGo on Anaconda"/> </a> <a href="https://anaconda.org/gosahan/nanogo"> <img src="https://anaconda.org/gosahan/nanogo/badges/platforms.svg"/> </a> <a href="https://anaconda.org/gosahan/nanogo"> <img src="https://anaconda.org/gosahan/nanogo/badges/latest_release_date.svg"/> </a> </td> </tr> </table> <table align="center" style="margin: 0px auto;">  <tr> <th>NanoGO QC Dashboard</th> <th>Local Environment Operations</th> </tr>  <tr> <td> <a href="https://github.com/phac-nml/nanogo/blob/main/extra/brave_jJ2pJG0u1c-ezgif.com-resize.gif"> <img src="https://raw.githubusercontent.com/phac-nml/nanogo/main/extra/brave_jJ2pJG0u1c-ezgif.com-resize.gif" alt="NanoGO QC Summary Report" height="auto" width="350"/> </a> </td> <td> <a href="https://github.com/phac-nml/nanogo/blob/main/extra/cmd_xxQZGCLvlP.gif"> <img src="https://raw.githubusercontent.com/phac-nml/nanogo/main/extra/cmd_xxQZGCLvlP.gif" alt="NanoGO tool in Action" height="auto" width="350"/> </a> </td> </tr>  <tr> <th colspan="2">Real-Time Performance Monitoring</th> </tr> <tr> <td colspan="2"> <a href="https://anaconda.org/gosahan/nanogo/Code_eWv9WVcoX8.gif"> <img src="https://raw.githubusercontent.com/phac-nml/nanogo/main/extra/Code_eWv9WVcoX8.gif" alt="NanoGO Progress Bar" height="auto" width="auto"/> </a> </td> </tr> </table>

Overview

NanoGO Pipeline: A comprehensive bioinformatics pipeline for Oxford Nanopore Technologies. The tool offers Dorado-based basecalling with duplex options, amplicon read assembly and polishing, primer trimming, and quality control.

Designed for efficient parallel processing and enhanced by interactive input and output configuration, NanoGO maximizes computational resources while providing flexible control to users.

Quick Installation Guide

Use the following command to install NanoGO:

conda create -n nanogo-0.3.7 "python=3.10" -y && conda activate nanogo-0.3.7 && conda install -c gosahan nanogo -y

If quick installation does not work then follow the instructions below

<env_name> specify you conda environment name in which to install NanoGO. For example, <env_name> can be conda create -n nanogo-0.3.7 "python=3.10"

conda create -n <env_name> "python=3.10"

conda activate <env_name>

conda install -c gosahan nanogo

Usage Requirements

NanoGO Analysis

Input Directory Structure: Unique Reference and Primer/Adaptor Sequences for Each Sample

Input Directory
├─ Sample_A
│  ├─ Sample_A_01.fastq.gz (fastq.gz, fastq)
│  ├─ Sample_A_02.fastq (fastq.gz, fastq)
│  ├─ Reference_Sequence.fasta (fasta)
│  └─ Primer_File.txt (txt)
├─ Sample_B
│  ├─ Sample_B_01.fastq.gz (fastq.gz, fastq)
│  ├─ Sample_B_02.fastq.gz (fastq.gz, fastq)
│  ├─ Sample_B_03.fastq.gz (fastq.gz, fastq)
│  ├─ Reference_Sequence.fasta (fasta)
│  └─ Primer_File.txt (txt)
├─ Sample_N
   ├─ Sample_N_01.fastq.gz (fastq.gz, fastq)
   ├─ {reference_sequence_files_name}.fasta (fasta)
   └─ {unique_primer/adaptor_files_name}.txt (txt)

Input Directory Structure: Shared Reference and Primer/Adaptor Sequences for All Samples

Input Directory
├─ Sample_A
│  ├─ Sample_A_01.fastq.gz (fastq.gz, fastq)
│  ├─ Sample_A_02.fastq (fastq.gz, fastq)
├─ Sample_B
│  ├─ Sample_B_01.fastq.gz (fastq.gz, fastq)
│  ├─ Sample_B_02.fastq.gz (fastq.gz, fastq)
│  ├─ Sample_B_03.fastq.gz (fastq.gz, fastq)
├─ Sample_N
│   ├─ Sample_N_01.fastq.gz (fastq.gz, fastq)
│
├─ {reference_sequence_files_name}.fasta (fasta)
│
└─ {unique_primer/adaptor_files_name}.txt (txt)

Guidelines for NanoGO Analysis Tool

Each folder is treated as a unique sample containing read files associated with the same barcode.
- If sequencing different genes and combining datasets for analysis is required, create a folder and move all fastq files intended for joint assembly.

Reference Sequence File Structure

>Reference_sequence
UUCCUACAAGGGAARNGCD

>Reference_sequence_1
UUCCUACAAGGGAARNGCD 
>Reference_sequence_2
UUCCDUACAAGGGAARNGCD 
>Reference_sequence_3
UUCCUACAARHGGGAARNGCD

If multiple sequences are present in a reference file, the analysis tool concatenates them into one reference.
- This generates a single consensus file; if a separate reference is needed for each sequence, consider running the analysis tool multiple times with only the desired gene.
- The reference sequence can contain the followinbg letters: WSRYVHBMNKUATGC.

Primer/Adaptor File structure:

>{your_forward_primer_name}-F
AACARMNKAGATAAAATAGTHAGT
>{your_forward_primer_name}-R
ATAATTATGRTGYTMNKTARACT

Primer/Adaptor Naming Requirements:

Primer names start with > and should not contain spaces (use _ instead).
Primer names must end with -F for forward primers or -R for reverse primers, which is critical during the primer trimming stage.

Primer/Adaptor Sequence Requirements:

Follow the primer name with its sequence.

>forward_primer-F
ATAATTATGRTGYTMNKTARACT
>reverse_primer-R
AACARMNKAGATAAAATAGTHAGT

System Requirements

Operating System: Linux or Windows Subsystem for Linux (WSL).
Memory: Minimum of 16 GB RAM.
Processor: At least 4 cores.

Name's of files and folders should not contain spaces, replace the spaces with _

NanoGO Basecaller

Input Directory Structure

Input Directory
├─ Raw_Sample_A
│  ├─ Raw_Sample_A_01.pod5 (POD5 or FAST5)
│  ├─ Raw_Sample_A_02.pod5 (POD5 or FAST5)
├─ Raw_Sample_B
│  ├─ Raw_Sample_B_01.fast5 (POD5 or FAST5)
│  ├─ Raw_Sample_B_02.fast5 (POD5 or FAST5)
│  ├─ Raw_Sample_B_03.fast5 (POD5 or FAST5)
└─ Raw_Sample_N
   └─ Raw_Sample_N_01.pod5 (fastq.gz, fastq)

The NanoGO Basecaller considers Raw_Sample_A and Raw_Sample_B as separate sequencing runs.
Files within each folder can be in either FAST5 or POD5 format.
- FAST5 files are converted to POD5 format during the sequencing process.

System Requirements

GPU Compatibility: Must be compatible with Dorado.
Operating System: Linux or Windows Subsystem for Linux (WSL).
Memory: Minimum of 16 GB RAM.
Processor: At least 4 cores.

Usage Instructions for `nanogo`

Test to see if the install is working

Input:

nanogo --help

Output:

usage: nanogo [options] <subcommand>

options:
  -h, --help            show this help message and exit
  -v, --version         Display the version number of NanoGO.

Available Tools:
  Valid subcommands

  {basecaller,analysis}
                        Description
    basecaller          Run nanogo basecaller using dorado basecaller
    analysis            Run nanogo bioinformatics pipeline for generating consensus sequences and qc report

Usage Instructions for `nanogo analysis`

NanoGO has several modes of operation.

Run nanogo analysis on command line to run pipeline in interactivate mode.

nanogo analysis

Run in command line mode by inputing the required argument.

nanogo analysis -i <fastq_folder_path> -o <output_directory_name> --primer <primer_file.txt> --ref <reference_seq.fasta> [options]

Arguments

Input/Output Options

-i <fastq_folder_path>: Path to the folder containing fastq data. Activates interactive mode if not provided.
-o <output_directory>: Output directory name (Default: "nanogo_output").

Main Options

--primer <primer_folder_path>: Path to folder containing primer sequences.
--ref <reference_folder_path>: Path to folder containing reference sequences.
--model : Prompt to select supported models used for basecalling (Default: r1041_e82_400bps_sup_v4.2.0)
--min <minimum_read_size>: Minimum size of read expected, (Default: 100)
--max <maximum_read_size>: Maximum size of read expected, (Default: 1500)
--overlap <overlap_size>: Minimum overlap of reads. This is used to determine the read trimming sensitivity (Default: 1000)
--quality <minimum_quality>: Minimum quality of nucleotide at end of reads. This value is used to trim low-quality bases.Default: 20
--barcode: Barcodes used for ONT sequencing. Choices: pcr_barcodes, rapid_barcodes, native_barcodes. (Default: native_barcodes)
--require_two_barcodes: Reads will only be put in barcode bins if they have a strong match for the barcode on both their start and end (Default: false)
--database <kraken2_database>: If you have a custom Kraken2 database installed locally then provide path to folder containing k2d files. (default: MinusB - Archaea, Viral, Plasmid, Human, UniVec_Core)
-v, --version: Use this to display the version number of NanoGO.

Example

nanogo analysis -input <basecalled_data_folder> -o <nanogo_output>

This command will process fastq data in the fastq_data directory, output results to nanogo_output, and prompt for the basecalling model.

Nanogo

Install / Use

README

Overview

Quick Installation Guide

If quick installation does not work then follow the instructions below

Usage Requirements

NanoGO Analysis

Input Directory Structure: Unique Reference and Primer/Adaptor Sequences for Each Sample

Input Directory Structure: Shared Reference and Primer/Adaptor Sequences for All Samples

Guidelines for NanoGO Analysis Tool

Reference Sequence File Structure

Primer/Adaptor File structure:

System Requirements

NanoGO Basecaller

Input Directory Structure

System Requirements

Usage Instructions for `nanogo`

Test to see if the install is working

Input:

Output:

Usage Instructions for `nanogo analysis`

NanoGO has several modes of operation.

Arguments

Input/Output Options

Main Options

Example

Output

NanoGO Analysis

Nanogo

Install / Use

README

Overview

Quick Installation Guide

If quick installation does not work then follow the instructions below

Usage Requirements

NanoGO Analysis

Input Directory Structure: Unique Reference and Primer/Adaptor Sequences for Each Sample

Input Directory Structure: Shared Reference and Primer/Adaptor Sequences for All Samples

Guidelines for NanoGO Analysis Tool

Reference Sequence File Structure

Primer/Adaptor File structure:

System Requirements

NanoGO Basecaller

Input Directory Structure

System Requirements

Usage Instructions for nanogo

Test to see if the install is working

Input:

Output:

Usage Instructions for nanogo analysis

NanoGO has several modes of operation.

Arguments

Input/Output Options

Main Options

Example

Output

NanoGO Analysis

Usage Instructions for `nanogo`

Usage Instructions for `nanogo analysis`