SyntenyQC
Want to make a neighbourhood-scale synteny plot? Have too many genomic neighbourhoods? SyntenyQC may help!
Install / Use
/learn @Tim-Kirkwood/SyntenyQCREADME
This app has been tested on Mac, Windows and Linux.
SyntenyQC
Citation
Timothy D J Kirkwood, Jack A Connolly, Ee Lui Ang, Huimin Zhao, Eriko Takano, Rainer Breitling, Synteny plot quality control with SyntenyQC, Bioinformatics, 2025;, btaf626, https://doi.org/10.1093/bioinformatics/btaf626
Motivation:
Synteny plots are widely used for the comparison of genomic neighbourhoods. Whilst synteny plots are often included as part of larger software suites (e.g. the antiSMASH ClusterBlast module), various low-code, stand-alone tools are now available that allow users to source candidate neighbourhoods and build their own synteny plots. However, a gap remains between:
(i) tools that source these candidate neighbourhoods (e.g. cblaster, which can find hundreds of candidates),
(ii) tools that build the synteny plots (e.g. clinker, which struggles as the number of neighbourhoods exceeds 30-50) and
(iii) the synteny plots themselves, which become much harder to analyse/present as the number of neighbourhoods they include increases.
Description:
SyntenyQC is a python app for the curation of neighbourhoods immediately prior to synteny plot creation. SyntenyQC collect supports the systematic definition and annotation of candidate neighbourhoods based on a direct integration to cblaster. SytenyQC sieve offers a flexible method for objectively removing redundant neighbourhoods (sourced using cblaster or any other tool) prior to synteny plot creation. This is in some cases an absolute requirement (e.g. cblaster called via the CAGECAT webserver places a limit of 50 neighbourhoods).
Installation
What do I do?
conda create --name syntenyqc_env pip python=3.12.9
conda activate syntenyqc_env
pip install SyntenyQC
Why do I do it?
You should install SyntenyQC within a virtual environment to make sure it doesn't interfere with any other software you have installed (read more here). There are various options for working with virtual environments, but I use conda - see their tutorial here. As SyntenyQC is uploaded to the Python Package Index (PyPI) and not Anaconda (yet - also, how is conda different from Anaconda and whats the relationship between pip and PyPI?), we will set up an environment using conda, install pip in that environment, and then install SyntenyQC using pip (note, Python can be any version between 3.10.0 and 3.12.9 inclusive).
Software you need to install manually
SyntenyQC depends on DIAMOND, which must be installed by the user (tested with v2.1.12.166 - but should work with other versions unless there are parameter changes). If this is installed correctly, you should be able to see help messages after typing diamond help (with no "-" or "--") in the command line.
Note - after you download the diamond executable file (.exe), you will probably need to add it to your path - this allows your computer to understand what you mean when you type diamond help into the command line. When you add to DIAMOND to your path, diamond help becomes equivalent to path/to/diamond.exe help. This is easy to do - on Windows, just go to Start, search for Edit the system environment variables, click Environment Variables, under User variables click Path and then Edit, then finally add the path to the folder with the exe (not the .exe filepath) to the Path variable.
Tests
Tests are performed using pytest, but are not distributed with SyntenyQC. To run tests:
- Install pytest
- Clone the
SyntenyQCgithub repository. - Update
path/to/cloned/repository/tests/email.txtwith your email (if left blank, two webscraping tests will fail). - Navigate to the cloned repository via command line (on Windows, type
cd path/to/cloned/repository). - Type
pytestin the command line.
Usage
General help:
>SyntenyQC -h
usage: SyntenyQC [-h] {collect,sieve} ...
options:
-h, --help show this help message and exit
subcommands:
{collect,sieve} Synteny quality control options
Collect subcommand:
>SyntenyQC collect -h
usage: SyntenyQC collect [-h] -bp -ns -em [-fn] [-sp] [-wg]
Write genbank files corresponding to cblaster neighbourhoods from a specified CSV-format binary file located at
BINARY_PATH. For each cblaster hit accession in the binary file:
1) A record is downloaded from NCBI using the accession. NCBI requires a user EMAIL to search for this record
programatically. If WRITE_GENOMES is specified, this record is written to a local file according to FILENAMES
(see final bulletpoint).
2) A neighbourhood of size NEIGHBOURHOOD_SIZE bp is defined, centered on the cblaster hits defined in the binary
file for the target accession.
3) (If STRICT_SPAN is specified:) If the accession's record is too small to contain a neighbourhood of the
desired size, it is discarded. For example, if an accession record is a 25kb contig and NEIGHBOURHOOD_SIZE
is 50000, the record is discarded.
4) If FILENAMES is "organism", the nighbourhood is written to file called *organism*.gbk. If FILENAMES is
"accession", the neighbourhood is written to *accession*.gbk. Synteny softwares such as clinker can use these
filenames to label synetny plot samples.
Once COLLECT has been run, a new folder with the same name as the binary file will be seen in the directory
that holds the binary file (i.e. the file "path/to/binary/file.txt" will generate the folder "path/to/binary/file").
This folder will have a subdirectory called "neighbourhood", containing all of the neighbourhood genbank files
(i.e. "path/to/binary/file/neighbourhood"). If WRITE_GENOMES is specified, a second direcory ("genome") will also
be present, containing the entire record associated with each cblaster accession (i.e. "path/to/binary/file/genome").
Finally, a log file will be present in the folder "path/to/binary/file", containing all run details.
options:
-h, --help show this help message and exit
-bp, --binary_path
Full filepath to the CSV-format cblaster binary file containing neighbourhoods that should
be extracted
-n, --neighbourhood_size
Size (basepairs) of neighbourhood to be extracted (centered on middle of CBLASTER-defined
neighbourhood)
-em, --email Email - required for NCBI entrez querying
-fn, --filenames
If "organism", all collected files will be named according to organism. If "accession", all
files will be named by NCBI accession. (default:
organism)
-sp, --strict_span
If set, will discard all neighbourhoods that are smaller than neighbourhood_size bp. For
example, if you set a neighbourhood_size of 50000, a 50kb neighbourhood will be extracted
from the NCBI record associateed with each cblaster hit. If the record is too small for this
to be done (i.e. the record is smaller then 50kb) it is discarded.
-wg, --write_genomes
If set, will write entire NCBI record containing a cblaster hit to file (as well as just the
neighbourhood)
Sieve subcommand:
>syntenyqc sieve -h
usage: SyntenyQC sieve [-h] -gf [-ev] [-mts] [-mev] [-sf] [-am] [-dmts] [-ex] [-qc] [-sc] [-id] [-ks]
Filter redundant genomic neighbourhoods based on neighbourhood similarity:
- First, an all-vs-all BLASTP is performed with user-specified BLASTP settings and the neighbourhoods in GENBANK_FOLDER.
- Secondly, these are parsed to define reciprocal best hits between every pair of neighbourhoods.
- Thirdly, these reciprocal best hits are used to derive a neighbourhood similarity network, where edges indicate two
neighbourhood nodes that have a similarity > SIMILARITY_FILTER. Similarity = Number of RBHs / Number of proteins in
smallest neighbourhood in pair.
- Finally, this network is pruned to remove neighbourhoods that exceed the user's SIMILARITY_FILTER threshold. Nodes
that remain are copied to the newly created folder "genbank_folder/sieve_results/genbank".
options:
-h, --help show this help message and exit
-gf, --genbank_folder
Full path to folder containing neighbourhood genbank files requiring de-duplication
-ev, --e_value BLASTP evalue threshold. (default: 1e-05)
-mts, --max_target_seqs
BLASTP -max_target_seqs. Maximum number of aligned sequences to keep. (default: 200)
-mev, --min_edge_view
Minimum similarity between two neighbourhoods for an edge to be drawn betweeen them in the
RBH graph. Purely for visualisation of the graph HTML file - has no impact on the graph pruning
results. (default: --similarity_filter)
-sf, --similarity_filter
Similarity threshold above which two neighbourhoods are considered redundant (default: 0.7)
-am, --alignment_mode
Alignment mode used by DIAMOND (choices: fast, mid-sensitive, sensitive, more-sensitive,
very-sensitive, ultra-sensitive). Without using any sensitivity op
Related Skills
node-connect
354.2kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
112.2kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
354.2kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
354.2kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
