HyraxDotPlot
HyraxDotPlot easily generates interactive and annotated dot plots allowing synteny to be visualised between two genomes, down to 1 bp resolution. It supports loading qualitative feature bed files as highlighted regions on the plot (e.g. gene presence); quantitative bed files as tracks (e.g.: read coverage); & annotation bed files as tracks.
Install / Use
/learn @Amjad-Khalaf/HyraxDotPlotREADME
HyraxDotPlot
<p align="center"> <img src="https://github.com/user-attachments/assets/8219b9aa-c78c-4c68-a082-96590c80876e" width="500"> </p>About
<div align="justify">H<sub>(tml-based-output)</sub>yraxDotPlot is a tool written in Python (using the library bokeh, which allows you to quickly and easily generate interactive dot plots from a pair-wise genome file (a minimap2 paf file, a FASTGA paf file, or a nucmer coords alignment. It provides you with the options below.
- Loading quantitative genomic feature bed files (e.g. for read depth along the genome) as tracks which run parallel to the axes their respective fastas are loaded on.
- Loading qualitative genomic feature bed files (presence/absence bed files, e.g. for genes of interest or telomere sequence motifs) as highlighted regions on the dot plot.
- Loading qualitative genomic feature bed files (e.g. for gene annotation bed files) as tracks which run parallel to the axes their respective fastas are loaded on.
You can load feature bed files, track bed files, and annotation track bed files on the same plot (or in any combination desired).
HyraxDotPlot is lightweight, and produces a static png, a static svg, and a interactive html file which can be opened in a browser of your choice. In the html output file, you can zoom in down to 1 base-pair resolution, and if the flag --curation_mode is used, you can select sequences of interest by tapping them (using the "Tap" tool) and storing their ids in a downloabable list. Additionally, using the "Hover" tool, you also have the ability to easily retrieve query sequence ids, subject sequence ids and match nucleotide identity % by hovering over any alignment of interest.
Update: What's new in version 2?
- paf files can now be used, as long as they contain the
detag. - The
--size_thresholdparameter allows you to control the minimum alignment size to be plotted. This is crucial if your alignment of choice does not carry out any chaining, or you are plotting the alignment of two large genomes. I recommend using a minimum of--size_threshold 1000for rapid execution (this is set by default). --x_annotation_bed_fileand--y_annotation_bed_fileallow you to load gene/repeat/other annotation bed files as tracks which run parallel to the axes their respective fastas are loaded on.- A colour bar is added at the bottom of all plots automatically to show the identity of the different alignments plotted.
- The ability to select contigs/scaffolds/chromosomes/sequences and add them to a downloadable list used to be automatic in the previous version, but now can only be enabled using
--curation_mode. This was done to increase efficiency. - By default, both a static png and a static svg are generated in addition to the interactive html output file. Even if
--curation_modeis enabled, the relevant selection and download widgets are not displayed in the static outputs.
Installation
<div align="justify">To run HyraxDotPlot, you will need a virtual environment with Python3 and the following installed.
holoviews==1.19.1
bokeh==3.6.0
matplotlib==3.9.2
selenium==4.25.0
chromedriver==2.24.1
webdriver-manager==4.0.2
firefox==131.0
geckodriver==0.35.0
<div align="justify">
Installation is simple, and running the tool requires the script hyraxdotplot.py only.
git clone https://github.com/Amjad-Khalaf/HyraxDotPlot.git
cd HyraxDotPlot
python hyraxdotplot.py -h ##check if it's running
Usage
1. Base run
<div align="justify">To run HyraxDotPlot with base features, generate a nucmer coords file (with flags -T -l) or a paf file with the de tag for your alignment and an index file (using samtools faidx) for each of your fasta files.
python hyraxdotplot.py --coords_file $FILE --x_index_file $FILE --y_index_file $FILE
options:
-h, --help show this help message and exit
--coords_file FILE nucmer coordinates file generated with -T and -l flags; x axis fasta
should be the query sequence and y axis fasta should be the subject
sequence (i.e. nucmer x.fasta y.fasta)
--paf_file FILE minimap2 paf file generated with -c flag or FastGA paf file with the
"de" tag available; x-axis fasta should be the query sequence and y-axis
fasta should be the subject sequence (i.e. minimap2/FastGA x.fasta y.fasta)
--x_index_file FILE samtools .fai file of fasta to be on x-axis of dotplot
--y_index_file FILE samtools .fai file of fasta to be on y-axis of dotplot
--threshold FLOAT (optional) minimum nucleotide identity of matches to be plotted; default is 90
--size_threshold FLOAT (optional) minimum length of matches to be plotted; default is 1000
--plot_title STR (optional) plot title; default is 'HyraxDotPlot'
--output_prefix STR (optional) output file name prefix; default is hyraxdotplot
--plot_width INT (optional) plot width; default is 800
--plot_height INT (optional) plot height; default is 600
--curation_mode (optional) flag that adds tap tool and allows you to select sequences of interest
<p align="center">
<img src="https://github.com/user-attachments/assets/8f817163-162d-4d5c-ad0f-f061a5c37a13">
</p>
2. Load feature bed files
<div align="justify">HyraxDotPlot also supports the loading of "qualitative" bed files that outline the location of genomic features (presence/absence bed files, e.g. for genes of interest or telomere sequence motifs), referred to as "feature bed files". These bed files will be loaded as highlighted regions on the dot plot, with a button to toggle them on or off, and a drop down menu to choose colours from. You may load a single bed file for one of the fastas, or a bed file for each fasta. Below is an example of a bed file which can be loaded in this manner. You can load feature bed files, track bed files, and annotation track bed files on the same plot (or in any combination desired).
</div>##the fourth column will be ignored for feature bed files
ptg000001l 980000 990000 309
ptg000005l 0 10000 270
ptg000009l 0 10000 202
<br>
python hyraxdotplot.py --coords_file $FILE --x_index_file $FILE --y_index_file $FILE
--x_feature_bed_file $FILE --y_feature_bed_file $FILE
options:
-h, --help show this help message and exit
--coords_file FILE nucmer coordinates file generated with -T and -l flags; x axis fasta
should be the query sequence and y axis fasta should be the subject
sequence (i.e. nucmer x.fasta y.fasta)
--paf_file FILE minimap2 paf file generated with -c flag or FastGA paf file with the
"de" tag available; x-axis fasta should be the query sequence and y-axis
fasta should be the subject sequence (i.e. minimap2/FastGA x.fasta y.fasta)
--x_index_file FILE samtools .fai file of fasta to be on x-axis of dotplot
--y_index_file FILE samtools .fai file of fasta to be on y-axis of dotplot
--threshold FLOAT (optional) minimum nucleotide identity of matches to be plotted; default is 90
--size_threshold FLOAT (optional) minimum length of matches to be plotted; default is 1000
--plot_title STR (optional) plot title; default is 'HyraxDotPlot'
--output_prefix STR (optional) output file name prefix; default is hyraxdotplot
--plot_width INT (optional) plot width; default is 800
--plot_height INT (optional) plot height; default is 600
--x_feature_bed_file FILE (optional) qualitative bed file for x-axis fasta to be loaded as
highlighted regions on dot plot
--y_feature_bed_file FILE (optional) qualitative bed file for y-axis fasta to be loaded as
highlighted regions on dot plot
--curation_mode (optional) flag that adds tap tool and allows you to select sequences of interest
<p align="center">
<img src="https://github.com/user-attachments/assets/4639984e-d080-4a25-ba06-2b6e945d3324">
</p>
3. Load track bed files
<div align="justify">In addition to loading bed files as highlighted regions on the dot plot, HyraxDotPlot can also load "quantitative" bed files as tracks which run parallel to the axes their respective fastas are loaded on, referred to as "track bed files". You may load a single bed file for one of the fastas, or a bed file for each fasta. Below is an example of a bed file which can be loaded in this manner. You can load feature bed files, track bed files, and annotation track bed files on the same plot (or in any combination desired).
</div>##the fourth column is used for track bed files, and needs to be numerical
ptg000001l 980000 990000 309
ptg000005l 0 10000 270
ptg000009l 0 10000 202
<br>
python hyraxdotplot.py --coords_file $FILE --x_index_file $FILE --y_index_file $FILE
--x_track_bed_file $FILE --x_track_title $STR --x_track_feature_name $STR --x_track_colour $STR
--y_track_bed_file $FILE --y_track_title $STR --y_track_feature_name $STR --y_track_colour $STR
options:
-h, --help show this help message and exit
--coords_file FILE nucmer coordinates file generated with -T and -l flags; x axis fasta
should be the query sequence and y axis fasta should be the subject
sequence (i.e. nucmer x.fasta y.fasta)
--paf_file FILE minimap2 paf file generated with -c flag or FastGA paf file with the
"de" tag available; x-ax
