SkillAgentSearch skills...

Novabrowse

Novabrowse a BLAST results interpretation tool and sequence alignment viewer for gene synteny analysis, gene position on chromosome visualization, and comparative genomics tool for multi-species ortholog identification

Install / Use

/learn @RegenImm-Lab/Novabrowse

README

<img src="images/novabrowse_logo.svg" alt="Novabrowse Logo" width="400">

Novabrowse is an interactive BLAST results interpretation tool and sequence alignment viewer for multi-species synteny analysis, ortholog finder, and chromosome mapping.

Find gene position on chromosome, generate synteny plots across multiple species, interpret BLAST alignments with coverage and identity visualizations, and discover unannotated genes through genomic signal analysis.

Table of Contents

How Novabrowse Works

1. You define a genomic region of interest, Novabrowse retrieves the gene sequences and runs BLAST searches against chosen subject species:

<img src="images/pipeline.svg" alt="Novabrowse synteny analysis pipeline showing BLAST search, ortholog identification, and chromosome mapping workflow">

2. The results are combined into an interactive HTML table with synteny ribbons and chromosome maps:

<img src="images/table_layout.svg" alt="Novabrowse BLAST results table with synteny plot, chromosome map, and sequence alignment coverage visualization">

For a more detailed breakdown, see Pipeline Overview and Basic Output Overview.

Why Choose Novabrowse

Novabrowse is free and open source (MIT License) and ships with several novel core capabilities along with features not found together in any other comparative genomics tool: <img src="images/feature_comparison.svg" alt="Feature comparison of Novabrowse with other BLAST alignment tools and comparative genomics tools" width="100%">

  • Coordinate-based genomic region search - Define a chromosomal region by coordinates and automatically retrieve gene sequences from that region for use as search queries
  • Distance-based HSP clustering of putative gene units - Identify unannotated genes in genomic regions through distance-based High-scoring Segment Pair (HSP) clustering, revealing gene units missed by standard annotation pipelines
  • Multi-species gene synteny visualization - Compare gene order conservation across multiple species simultaneously with interactive ribbon plots connecting orthologous genes across chromosomes
  • Chromosomal visualization & mapping - Display hit locations on chromosome ideograms, providing genomic context for alignment results
  • Coverage visualization - View alignment coverage as identity-color-coded bars positioned along query sequences, showing both extent and quality of matches
  • Integrated BLAST search - Natively executes BLAST searches within the pipeline, no need to run BLAST separately and import results
  • Custom BLAST database support - Use your own BLAST databases built from any FASTA sequences, not limited to pre-built databases
  • Isoform-aware hit consolidation - Consolidates multiple transcript isoform hits into single gene entries, preventing duplicate results from the same locus

Getting Started

Novabrowse can be run in three ways: using a Jupyter Notebook or from the command line using Docker or Apptainer containers. The containerized methods include all dependencies and support running on HPC (High-Performance Computing) clusters.

After the initial setup (preparing subject species files, creating BLAST databases, and generating chromosome data), running analyses themselves is quick and straightforward.

| Method | Setup guide | |--------|-------------| | Jupyter Notebook | Installation Prerequisites (below) | | Docker | Docker Setup (in docker/ folder) | | Apptainer | Apptainer Setup (in docker/ folder) |

Installation Prerequisites (Jupyter Notebook)

1. Python 3.8+

Download from python.org

2. Jupyter Notebook Environment

Novabrowse pipeline runs in a Jupyter Notebook, so you need a compatible program, for example:

3. <a href="https://www.ncbi.nlm.nih.gov/" target="_blank">NCBI</a> BLAST+ Command Line Tools

BLAST+ must be installed and available in your system PATH.

Option A: Conda (macOS / Linux / Windows)

conda install -c bioconda blast

Option B: Homebrew (macOS)

brew install blast

Option C: Manual Installation

  1. Download from NCBI FTP
  2. Install and add to system PATH
  3. On Linux, if you get a missing library error, install the OpenMP runtime: sudo apt install libgomp1 (Debian/Ubuntu) or sudo yum install libgomp (RHEL/CentOS)

4. NCBI Account

Novabrowse uses the NCBI Entrez API to retrieve sequences, which requires an NCBI account:

  • You can create the account at ncbi.nlm.nih.gov/account
  • The email associated with your NCBI account will also be used to identify your Entrez API requests

Installation (Jupyter Notebook)

  1. Download the repository

    Option A: Clone with Git

    git clone https://github.com/RegenImm-Lab/Novabrowse.git
    

    Option B: Download ZIP and extract it

  2. Install Python dependencies

    Open a terminal in the project folder and run:

    macOS / Linux:

    python3 -m pip install -r requirements.txt
    

    Windows:

    py -m pip install -r requirements.txt
    

Try It Quickly (Jupyter Notebook)

The repository comes pre-configured with three example species (S. cerevisiae, S. pombe, C. albicans) with BLAST databases and chromosome data included. To run the example analysis:

  1. Open novabrowse_1.0.ipynb and find entrez_email = None under the "General Setup" section (cell 2)
  2. Replace None with your NCBI account email:
    entrez_email = "you@email.com"
    
  3. Run the notebook. Results will be in the output/ folder as HTML files

To learn how to add your own species and configure analyses, see Setup below.

Setup (Jupyter Notebook)

In Novabrowse:

  • Query species - the species whose genes you want to search for (your genes of interest)
  • Subject species - the species you search against to find homologous matches

1. Prepare subject species files

Novabrowse supports both transcriptome and genome analysis. For each subject species, you'll need:

  • GTF annotation file (Gene Transfer Format) - contains gene coordinates, names, and transcript information. The GTF must follow NCBI formatting conventions, but can come from any source (e.g., NCBI, Ensembl, or your own custom annotations).
  • FASTA sequence file - either transcriptome (rna.fna) or genome (genomic.fna) depending on your analysis needs. These can also be custom assemblies as long as they match the GTF.

Place the downloaded files in:

1_subject_sequences/<custom_name>/<assembly>/
├── genomic.gtf       # Required: GTF annotation file
├── rna.fna           # For transcriptome analysis (exact filename required)
└── *_genomic.fna     # For genome analysis (must contain "_genomic" in filename)

Note: The transcriptome file must be named exactly rna.fna. Genome files must contain _genomic in the filename (e.g., GCF_000146045.2_R64_genomic.fna).

For instructions on how to download these files from NCBI, see How to download subject species sequences from NCBI in Tutorial 1.

2. Create subject species BLAST databases

Open make_blastdb.ipynb and edit the second cell to add your species, then run the notebook.

run_makeblastdb(
    "1_subject_sequences/<custom_name>/<assembly>/rna.fna",
    "nucl",
    "2_subject_blastdb/<custom_name>_<assembly>"
)

For example, if you placed S. cerevisiae files in step 1 like this:

1_subject_sequences/s_cerevisiae/GCF_000146045.2/
├── genomic.gtf
├── rna.fna
└── GCF_000146045.2_R64_genomic.fna

The corresponding make_blastdb calls would be:

# Transcriptome database
run_makeblastdb(
    "1_subject_sequences/s_cerevisiae/GCF_000146045.2/rna.fna",
    "nucl",
    "2_subject_blastdb/s_cerevisiae_GCF_000146045.2"
)

# Genome database
run_makeblastdb(
    "1_subject_sequences/s_cerevisiae/GCF_000146045.2/GCF_000146045.2_R64_genomic.fna",
    "nucl",
    "2_subject_blastdb/s_cerevisiae_GCF_000146045.2_genome"
)

3. Set up NCBI email

The NCBI Entrez API requires an email address to identify requests. If you don't have an NCBI account yet, create one at ncbi.nlm.nih.gov/account.

**Option A: Set system en

View on GitHub
GitHub Stars5
CategoryProduct
Updated4d ago
Forks0

Languages

HTML

Security Score

90/100

Audited on Apr 6, 2026

No findings