Blastn2dotplots
blastn2dotplots: multiple dot-plot visualizer
Install / Use
/learn @mokuno3430/Blastn2dotplotsREADME
blastn2dotplots
This repository contains the blastn2dotplots script and sample data.
Description
blastn2dotplots: A script for visualizing multiple dot-plot alignments from BLASTN output.
Requirements
This project has been tested with the following environment:
python 3.8.20matplotlib 3.7.3pandas 2.0.3numpy 1.24.4
Installation
Install via Bioconda (Recommended)
You can install blastn2dotplots directly from Bioconda:
conda install -c bioconda blastn2dotplots
Note: If you haven't configured Bioconda before, follow the instructions here to set up the necessary channels.
Optional: Create a dedicated conda environment
conda create -n blastn2dotplots-env blastn2dotplots
conda activate blastn2dotplots-env
conda install -c bioconda blastn2dotplots
Install manually from source (alternative method)
If you prefer not to use Bioconda:
git clone https://github.com/mokuno3430/blastn2dotplots.git
cd blastn2dotplots
chmod u+x blastn2dotplots
Required dependencies:
conda install matplotlib=3.7.1 numpy=1.24.4 pandas=2.0.3
Usage
usage: blastn2dotplots [-h] -i1 str [-i2 str] --blastn str [--out str]
[--highlight str] [--highlight_crossed str]
[--min_identity int] [--min_alignlen int] [--line_width float]
[--share] [--manual_grid str] [--show_grid] [--grid_color str]
[--xtitle_rotate int] [--ytitle_rotate int]
[--font_size float] [--figure_size float float]
[--hspace float] [--wspace float] [--h_alpha float]
[--tick_label_size float] [--tick_width int]
[--left_edge float] [--right_edge float] [--top_edge float] [--bottom_edge float]
[--colormap {0,1,2,3,4,5}] [-v]
Options
-h, --help show this help message and exit
-i1, --input1 str sequence IDs of database at blastn (=row)
-i2, --input2 str sequence IDs of query at blastn (=column)
--blastn str blastn.tsv (-outfmt '6 std qlen slen' at blastn)
--out str Optional: prefix of pdf file (default out)
--highlight str Optional: positions.tsv (scaffold start end color)
--highlight_crossed str
Optional: positions.tsv (scaffold start end color)
--min_identity int Optional: minimum sequence identity (default 90)
--min_alignlen int Optional: minimum alignment length (default 100)
--line_width float Optional: line width of dotplots (default 1.0)
--share Optional: sharing axis scales among subplots.
--manual_grid str Optional: positions.tsv (scaffold position)
--show_grid Optional: show grid-line
--grid_color str Optional: grid color (default grey)
--xtitle_rotate int Optional: (default 0)
--ytitle_rotate int Optional: (default 90)
--font_size float Optional: (default 8)
--figure_size float float
Optional: Specify figure size as two numbers >= 3 (default: [8, 8]).
--hspace float Optional: (default -1 means auto)
--wspace float Optional: (default -1 means auto)
--h_alpha float Optional: transparency ratio of highlights (float in the range [0, 1]. default 0.3)
--tick_label_size float
Optional: (default 6)
--tick_width int Optional: scale width of axis (bp) (default -1 means auto)
--left_edge float Optional: the position of the left edge of the subplots, as a fraction of the figure width (float in the range [0, 1]. default
0.15)
--right_edge float Optional: the position of the right edge of the subplots (float in the range [0, 1]. default 0.95)
--top_edge float Optional: the position of the top edge of the subplots (float in the range [0, 1]. default 0.90)
--bottom_edge float Optional: the position of the bottom edge of the subplots (float in the range [0, 1]. default 0.10)
--colormap {0,1,2,3,4,5}
Optional: colormap for identity of alignments ( 0:binary, 1:bone_r, 2:inferno_r, 3:hot_r, 4:YlGnBu, 5:original) (default 5)
-v, --version show program's version number and exit
Sample Data
The sample_data directory contains two datasets that demonstrate the functionality of blastn2dotplots. Each dataset includes input files and a runme.sh script for generating dot plots.
Directory Structure:
sample_data
├── figure3A
│ ├── blastn_LT-plasmids_shift.txt # BLASTN alignment results
│ ├── db.txt # Database configuration
│ ├── highlights_core.txt # Highlights configuration
│ ├── query.txt # Query sequences configuration
│ └── runme.sh # Script to generate the dot plot
└── figure3B
├── blastn_mtDNA-vs-numt.txt # BLASTN alignment results
├── db.txt # Database configuration
├── grids.txt # Grid lines configuration
├── query.txt # Query configuration
└── runme.sh # Script to generate the dot plot
Dataset Details:
Figure 3A: Plasmid Comparison
- Visualizes sequence alignments between plasmids.
- Demonstrates the use of
--highlight_crossedto emphasize overlapping loci.
Figure 3B: numt and mtDNA Comparison
- Compares the nuclear mitochondrial insertion (numt) region on Arabidopsis thaliana chromosome 2 with mtDNA.
- Demonstrates the use of
--manual_gridto add user-defined grid lines.
Usage:
- Navigate to the dataset directory:
cd sample_data/figure3A
- Grant execute permission to the script
chmod u+x runme.sh
- Run the runme.sh script:
./runme.sh
- The resulting dot plot will be saved in the same directory as a PDF file.
tips
preparation files
Query and Database Files (Required)
The -i1 and -i2 options are used to specify the query and database sequences, respectively. If -i2 is omitted, the tool automatically treats -i1 as both the query and database, generating a self-alignment plot.
Subplots are displayed according to the order of sequences in the files:
- Database sequences (
-i1, vertical axis) are arranged from top to bottom. - Query sequences (
-i2, horizontal axis) are arranged from left to right.
Each file should be a tab-separated text file with the following format:
-
Minimum required format:
- Column 1: Sequence ID
-
Optional extended formats:
- Column 1: Sequence ID
- Column 2: Display name (optional)
or
- Column 1: Sequence ID
- Column 2: Display name (optional)
- Column 3: Start position (optional; for adjusting axis scale)
or
- Column 1: Sequence ID
- Column 2: Display name (optional)
- Column 3: Start position (optional; shifts the origin of the axis scale)
- Column 4: Orientation (optional;
+or-)
If only the first column (Sequence ID) is needed, these files can be easily generated from the blastn output:
cut -f 1 blastn_outfmt6.txt | sort | uniq > query.txt
cut -f 2 blastn_outfmt6.txt | sort | uniq > db.txt
Highlight regions (Optional)
To highlight specific regions, prepare a tab-delimited file and use either the --highlight or --highlight_crossed option. The format is the same for both:
- Column 1: Sequence ID
- Column 2: Start position (0-based)
- Column 3: End position
- Column 4: Color code (e.g.,
#FF0000for red)
No header or index is required.
Example (highlights.txt):
plasmid1 5000 15000 #FF9999
See Figure 2A and related explanations in the manuscript for more details.
Grid lines (Optional)
To add user-defined grid lines, prepare a tab-delimited text file and specify it with the --manual_grid option. The file must contain two columns:
- Column 1: Sequence ID
- Column 2: Position (0-based coordinate)
No header or index is required. This is useful for marking structural variant breakpoints or other specific loci.
Example (grids.txt):
chromosome1 1000000
Refer to Figure 2B and the main text for usage details.
colormap examples
To represent alignment identity, blastn2dotplots supports six built-in colormaps.
These can be specified using the --colormap option.
The default colormap is a rainbow gradient (--colormap 5), but users are encouraged to select alternative colormaps—such as those designed for perceptual uniformity or color-blind accessibility—depending on their visualization needs.
Example outputs for each colormap are shown below.
<img src="https://github.com/user-attachments/assets/94447dbe-593e-4e28-ae35-285dc96f25fc" width="500">Support for Additional Aligners
In addition to blastn, the tool also supports alignment results from other commonly used aligners, such as Minimap2 and MUMmer (nucmer). We provide utility scripts that convert their outputs into the BLASTN-like format (-outfmt '6 std qlen slen') accepted by blastn2dotplots.
Minimap2
To use Minimap2 output, run Minimap2 with the -c option to include CIGAR strings in the PAF output. This information is required for computing alignment identity.
-c: Output CIGAR in PAF format (required)
minimap2 -c ref.fa query.fa > minimap2-output.paf
perl paf2blastn-fmt6.pl minimap2-output.paf > out_blastn-fmt6-like.txt
MUMmer (nucmer)
To use MUMmer's nucmer output, process it with show-coords using the -l and -T options to generate a tab-delimited file with sequence lengths.
-l: Include the sequence length information in the output (required)-T: Tab-delimited output format (required)
nucmer ref.fa query.fa -p nucmer-out
show-coords -l -T nucmer-out.delta > nucmer-
