Novabrowse is an interactive BLAST results interpretation tool and sequence alignment viewer for multi-species synteny analysis, ortholog finder, and chromosome mapping.

Find gene position on chromosome, generate synteny plots across multiple species, interpret BLAST alignments with coverage and identity visualizations, and discover unannotated genes through genomic signal analysis.

How Novabrowse Works
Why Choose Novabrowse
Getting Started
Installation Prerequisites
Installation
Try It Quickly
Setup
Tutorial 1: Detecting Orthologs Across Species
Tutorial 2: Using Custom Sequences and Gene Signal Discovery
Documentation
Troubleshooting
FAQ
License
Citation
Contributing

How Novabrowse Works

1. You define a genomic region of interest, Novabrowse retrieves the gene sequences and runs BLAST searches against chosen subject species:

2. The results are combined into an interactive HTML table with synteny ribbons and chromosome maps:

For a more detailed breakdown, see Pipeline Overview and Basic Output Overview.

Why Choose Novabrowse

Novabrowse is free and open source (MIT License) and ships with several novel core capabilities along with features not found together in any other comparative genomics tool: <img src="images/feature_comparison.svg" alt="Feature comparison of Novabrowse with other BLAST alignment tools and comparative genomics tools" width="100%">

Coordinate-based genomic region search - Define a chromosomal region by coordinates and automatically retrieve gene sequences from that region for use as search queries
Distance-based HSP clustering of putative gene units - Identify unannotated genes in genomic regions through distance-based High-scoring Segment Pair (HSP) clustering, revealing gene units missed by standard annotation pipelines
Multi-species gene synteny visualization - Compare gene order conservation across multiple species simultaneously with interactive ribbon plots connecting orthologous genes across chromosomes
Chromosomal visualization & mapping - Display hit locations on chromosome ideograms, providing genomic context for alignment results
Coverage visualization - View alignment coverage as identity-color-coded bars positioned along query sequences, showing both extent and quality of matches
Integrated BLAST search - Natively executes BLAST searches within the pipeline, no need to run BLAST separately and import results
Custom BLAST database support - Use your own BLAST databases built from any FASTA sequences, not limited to pre-built databases
Isoform-aware hit consolidation - Consolidates multiple transcript isoform hits into single gene entries, preventing duplicate results from the same locus

Getting Started

Novabrowse can be run in three ways: using a Jupyter Notebook or from the command line using Docker or Apptainer containers. The containerized methods include all dependencies and support running on HPC (High-Performance Computing) clusters.

After the initial setup (preparing subject species files, creating BLAST databases, and generating chromosome data), running analyses themselves is quick and straightforward.

| Method | Setup guide | |--------|-------------| | Jupyter Notebook | Installation Prerequisites (below) | | Docker | Docker Setup (in docker/ folder) | | Apptainer | Apptainer Setup (in docker/ folder) |

Installation Prerequisites (Jupyter Notebook)

1. Python 3.8+

Download from python.org

2. Jupyter Notebook Environment

Novabrowse pipeline runs in a Jupyter Notebook, so you need a compatible program, for example:

VS Code with the Jupyter extension

3. <a href="https://www.ncbi.nlm.nih.gov/" target="_blank">NCBI</a> BLAST+ Command Line Tools

BLAST+ must be installed and available in your system PATH.

Option A: Conda (macOS / Linux / Windows)

conda install -c bioconda blast

Option B: Homebrew (macOS)

brew install blast

Option C: Manual Installation

Download from NCBI FTP
Install and add to system PATH
On Linux, if you get a missing library error, install the OpenMP runtime: sudo apt install libgomp1 (Debian/Ubuntu) or sudo yum install libgomp (RHEL/CentOS)

4. NCBI Account

Novabrowse uses the NCBI Entrez API to retrieve sequences, which requires an NCBI account:

You can create the account at ncbi.nlm.nih.gov/account
The email associated with your NCBI account will also be used to identify your Entrez API requests

Installation (Jupyter Notebook)

Download the repository

Option A: Clone with Git
```
git clone https://github.com/RegenImm-Lab/Novabrowse.git
```
Option B: Download ZIP and extract it
Install Python dependencies

Open a terminal in the project folder and run:

macOS / Linux:
```
python3 -m pip install -r requirements.txt
```
Windows:
```
py -m pip install -r requirements.txt
```

Try It Quickly (Jupyter Notebook)

The repository comes pre-configured with three example species (S. cerevisiae, S. pombe, C. albicans) with BLAST databases and chromosome data included. To run the example analysis:

Open novabrowse_1.0.ipynb and find entrez_email = None under the "General Setup" section (cell 2)
Replace None with your NCBI account email:
```
entrez_email = "you@email.com"
```
Run the notebook. Results will be in the output/ folder as HTML files

To learn how to add your own species and configure analyses, see Setup below.

Setup (Jupyter Notebook)

In Novabrowse:

Query species - the species whose genes you want to search for (your genes of interest)
Subject species - the species you search against to find homologous matches

1. Prepare subject species files

Novabrowse supports both transcriptome and genome analysis. For each subject species, you'll need:

GTF annotation file (Gene Transfer Format) - contains gene coordinates, names, and transcript information. The GTF must follow NCBI formatting conventions, but can come from any source (e.g., NCBI, Ensembl, or your own custom annotations).
FASTA sequence file - either transcriptome (rna.fna) or genome (genomic.fna) depending on your analysis needs. These can also be custom assemblies as long as they match the GTF.

Place the downloaded files in:

1_subject_sequences/<custom_name>/<assembly>/
├── genomic.gtf       # Required: GTF annotation file
├── rna.fna           # For transcriptome analysis (exact filename required)
└── *_genomic.fna     # For genome analysis (must contain "_genomic" in filename)

Note: The transcriptome file must be named exactly rna.fna. Genome files must contain _genomic in the filename (e.g., GCF_000146045.2_R64_genomic.fna).

For instructions on how to download these files from NCBI, see How to download subject species sequences from NCBI in Tutorial 1.

2. Create subject species BLAST databases

Open make_blastdb.ipynb and edit the second cell to add your species, then run the notebook.

run_makeblastdb(
    "1_subject_sequences/<custom_name>/<assembly>/rna.fna",
    "nucl",
    "2_subject_blastdb/<custom_name>_<assembly>"
)

For example, if you placed S. cerevisiae files in step 1 like this:

1_subject_sequences/s_cerevisiae/GCF_000146045.2/
├── genomic.gtf
├── rna.fna
└── GCF_000146045.2_R64_genomic.fna

The corresponding make_blastdb calls would be:

# Transcriptome database
run_makeblastdb(
    "1_subject_sequences/s_cerevisiae/GCF_000146045.2/rna.fna",
    "nucl",
    "2_subject_blastdb/s_cerevisiae_GCF_000146045.2"
)

# Genome database
run_makeblastdb(
    "1_subject_sequences/s_cerevisiae/GCF_000146045.2/GCF_000146045.2_R64_genomic.fna",
    "nucl",
    "2_subject_blastdb/s_cerevisiae_GCF_000146045.2_genome"
)

3. Set up NCBI email

The NCBI Entrez API requires an email address to identify requests. If you don't have an NCBI account yet, create one at ncbi.nlm.nih.gov/account.

**Option A: Set system en

Novabrowse

Install / Use

README

Table of Contents