SOI
Robust identification of orthologous Synteny with the Orthology Index
Install / Use
/learn @zhangrengang/SOIREADME
Table of Contents
- Quick start
- Introduction
- Installation
- Subcommands
- Other functions
- Phylogenomics pipeline
- Grid Computing
- Input formats
- Output formats
- Citation
Quick start
git clone https://github.com/zhangrengang/orthoindex.git
cd orthoindex
# install
conda env create -f OrthoIndex.yaml
conda activate OrthoIndex
pip3 install .
# test
cd example_data/
sh example.sh
# example.sh:
# dot plots
# A
soi dotplot -s Populus_trichocarpa-Salix_dunnii.collinearity.gz \
-g Populus_trichocarpa-Salix_dunnii.gff.gz -c Populus_trichocarpa-Salix_dunnii.ctl \
--kaks Populus_trichocarpa-Salix_dunnii.collinearity.ks.gz \
--xlabel '$Populus~trichocarpa$' --ylabel '$Salix~dunnii$' \
--ks-hist --max-ks 1.5 -o Populus_trichocarpa-Salix_dunnii \
--plot-ploidy --gene-axis --number-plots
# B
soi dotplot -s Populus_trichocarpa-Salix_dunnii.orthologs.gz \
-g Populus_trichocarpa-Salix_dunnii.gff.gz -c Populus_trichocarpa-Salix_dunnii.ctl \
--kaks Populus_trichocarpa-Salix_dunnii.collinearity.ks.gz \
--xlabel '$Populus\ trichocarpa$' --ylabel '$Salix\ dunnii$' \
--ks-hist --max-ks 1.5 -o Populus_trichocarpa-Salix_dunnii.o \
--plot-ploidy --gene-axis --number-plots \
# homology input
# C
soi dotplot -s Populus_trichocarpa-Salix_dunnii.collinearity.gz \
-g Populus_trichocarpa-Salix_dunnii.gff.gz -c Populus_trichocarpa-Salix_dunnii.ctl \
--xlabel '$Populus\ trichocarpa$' --ylabel '$Salix\ dunnii$' \
--ks-hist -o Populus_trichocarpa-Salix_dunnii.io \
--plot-ploidy --gene-axis --number-plots \
--ofdir OrthoFinder/OrthoFinder/Results_*/ --of-color # coloring by Orthology Index
# D
soi dotplot -s Populus_trichocarpa-Salix_dunnii.collinearity.gz \
-g Populus_trichocarpa-Salix_dunnii.gff.gz -c Populus_trichocarpa-Salix_dunnii.ctl \
--kaks Populus_trichocarpa-Salix_dunnii.collinearity.ks.gz \
--xlabel '$Populus~trichocarpa$' --ylabel '$Salix~dunnii$' \
--ks-hist --max-ks 1.5 -o Populus_trichocarpa-Salix_dunnii.io \
--plot-ploidy --gene-axis --number-plots \
--ofdir OrthoFinder/OrthoFinder/Results_*/ --of-ratio 0.6 # filtering by Orthology Index
# filter orthologous synteny
soi filter -s Populus_trichocarpa-Salix_dunnii.collinearity.gz -o OrthoFinder/OrthoFinder/Results_*/ \
-c 0.6 > Populus_trichocarpa-Salix_dunnii.collinearity.ortho.test
# or (alter input format)
soi filter -s Populus_trichocarpa-Salix_dunnii.collinearity.gz -o Populus_trichocarpa-Salix_dunnii.orthologs.gz \
-c 0.6 > Populus_trichocarpa-Salix_dunnii.collinearity.ortho.test
# compare with the expected output: no output via `diff`
diff Populus_trichocarpa-Salix_dunnii.collinearity.ortho Populus_trichocarpa-Salix_dunnii.collinearity.ortho.test
Note: If you want to run the full phylogenomics pipeline of SOI,
GENE ID is needed to label with SPECIES ID (e.g. Angelica_sinensis|AS01G00001) for compatibility.
See details how to prepare the data.
Anyway, the GENE/CHROMOSOME IDs in the input files are at least required to be consistent and unique.
Example output dot plots
Figure. The Orthology Index in identifying orthologous synteny: a typical case.
A) Ks-colored dot plots showing synteny detected by WGDI, with an observable distinction of three categories of syntenic blocks derived from three evolutionary events (three peaks: Ks ≈ 1.5, Ks ≈ 0.27, and Ks ≈ 0.13).
B) Ks-colored dot plots illustrating orthology inferred by OrthoFinder2, with an observable high proportion of hidden out-paralogs (Ks ≈ 0.27).
C) Orthology Index (OI)-colored dot plots: integrating synteny (A) and orthology (B), with polarized and scalable distinction of three categories of syntenic blocks (three peaks: OI ≈ 0, OI ≈ 0.1, and OI ≈ 0.9).
D) Ks-colored dot plots of synteny after applying an OI cutoff of 0.6, with clean 1:1 orthology as expected from the evolutionary history.
A-D are plotted using the dotplot subcommand with four subplots:
a) dot plots with colored by Ks or OI (x-axis and y-axis, chromosomes of the two genomes; a dot indicates a homologous gene pair between the two genomes),
b) histogram (with the same color map as the dot plots) of Ks or OI (x-axis, Ks or OI; y-axis, number of homologous gene pairs),
c-d) synteny depth (orthologous synteny depth indicating relative ploidy) derived from 50-gene windows (x-axis, synteny depth; y-axis, number of windows).
Introduction
Orthology Index (OrthoIndex or OI) incorporates algorithmic advances of two methods (orthology inference and synteny detection), to determine the orthology of a syntenic block. It is straightforward, representing the proportion of orthologous gene pairs within a syntenic block.
Installation
conda
You can install the environment and the lasest verion using conda or mamba:
git clone https://github.com/zhangrengang/orthoindex.git
cd orthoindex
mamba env create -f OrthoIndex.yaml
mamba activate OrthoIndex
pip3 install .
soi -h
Sometimes, OrthoIndex.yaml may be failed due to conflicts. You can install the dependencies as below:
mamba install python=3.8.8 -y -n orthoindex
mamba install -y -n orthoindex biopython networkx lazy-property drmaa psutil matplotlib \
mafft trimal 'iqtree>=2' newick_utils pal2nal mcl muscle \
wgdi orthofinder aster
mamba activate orthoindex
pip3 install .
soi -h
Alternatviely, the released version can be installed through conda or mamba:
mamba create -n OrthoIndex
mamba install -n OrthoIndex -c conda-forge -c bioconda soi
mamba activate OrthoIndex
soi -h
Apptainer/Singularity
To use the container, you need to have installed Apptainer or Singularity. Then you can download the container image and run:
apptainer remote add --no-login SylabsCloud cloud.sylabs.io
apptainer remote use SylabsCloud
apptainer pull orthoindex.sif library://shang-hongyun/collection/orthoindex:1.2.0
./orthoindex.sif soi -h
The image can be found here.
Subcommands
$ soi -h
usage: soi [-h] {dotplot,filter,cluster,outgroup,phylo,stats} ...
Play with Orthology Index
positional arguments:
{dotplot,filter,cluster,outgroup,phylo,stats}
sub-command help
dotplot Generate colored dot plots
filter Filter synteny with Orthology Index (standard output)
cluster Cluster syntenic orthogroups (SOGs)
outgroup Add outgroups for SOGs from synteny
phylo Build gene trees from SOGs
stats Make statistics of SOGs for phylogeny
optional arguments:
-h, --help show this help message and exit
filter
The subcommand filter filters orthologous blocks with a default minimum index of 0.6:
$ soi filter -h
usage: soi filter [-h] -s [FILE [FILE ...]] -o [FOLDER/FILE [FOLDER/FILE ...]] [-c FLOAT] [-u FLOAT] [-n INT] [-g FILE] [-d INT] [-stat OUT_STATS] [-oo]
optional arguments:
-h, --help show this help message and exit
-s [FILE [FILE ...]], -synteny [FILE [FILE ...]]
Collinearity files output from MCscanX, WGDI, or MCscan/JCVI. [required]
-o [FOLDER/FILE [FOLDER/FILE ...]], -orthology [FOLDER/FILE [FOLDER/FILE ...]]
Orthologues output from OrthoFinder (folder), or OrthoMCL (file). [required]
-c FLOAT, -cutoff FLOAT
Cutoff (lower limit) of Orthology Index [default=0.6]
-u FLOAT, -upper FLOAT
Upper limit of Orthology Index [default=1]
-n INT, -min_n INT Minimum gene number in a block [default=0]
-g FILE, -gff FILE Gff file. [required for `-d`]
-d INT, -min_dist INT
Minimum distance to remove a tandem repeated block [default=None]
-stat OUT_STATS Output stats by species pairs. [default=None]
-oo Output retained orthology instead of synteny. [default=False]
Usage examples:
# from outputs of WGDI and OrthoFinder
soi filter -s wgdi/*.collinearity -o OrthoFinder/OrthoFinder/Result*/ > collinearity.ortho
# from outputs of MCscanX and OrthoMCL
soi filter -s mcscanx/*.collinearity -o pairs/orthologs.txt > collinearity.ortho
# from a list file and decrease the cutoff
ls wgdi/*.collinearity > collinearity.list
soi filter -s collinearity.list -o OrthoFinder/OrthoFinder/Result*/ -c 0.5 > collinearity.ortho
# filter a out-paralogous peak
soi filter -s wgdi/*.collinearity -o OrthoFinder/OrthoFinder/Result*/
Related Skills
node-connect
341.6kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
84.6kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
341.6kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
commit-push-pr
84.6kCommit, push, and open a PR
