SkillAgentSearch skills...

SynGAP

No description available

Install / Use

/learn @yanyew/SynGAP
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

SynGAP

SyGAP_logo

A toolkit for comparative genomics and transcriptomics research of related species.

SynGAP (Synteny-based Gene Structure Annotation Polisher) is a command-line software written in Python3, suitable for Linux operating systems. And we provides the image that can be used for other operating systems such as MacOS and Windows. It supports two main workflows: (1) genome annotations polishing for related species (dual, master, triple, and custom); (2) gene differential expression analysis of related species (genepair, evi, and eviplot).

Find source codes and documentation at https://github.com/yanyew/SynGAP<br /> Find detailed documentation at https://www.yuque.com/yanyew/gc786d<br /> For any question about SynGAP, please contact 360875601w@gmail.com<br /> If you use SynGAP, please cite: Wu F, Mai Y, Chen C, et al. SynGAP: a synteny-based toolkit for gene structure annotation polishing[J]. Genome Biology, 2024, 25(1): 218. https://doi.org/10.1186/s13059-024-03359-8

Installation

conda (recommended)

conda install -c conda-forge -c bioconda syngap

manually

cd ~/code  # or any directory of your choice
git clone git://github.com/yanyew/SynGAP.git
cd ~/code/SynGAP
conda env create -f SynGAP.environment.yaml -c conda-forge -c bioconda
export PATH=~/code/SynGAP:$PATH

Docker image

docker pull yanyew/syngap:1.2.5
docker run -it yanyew/syngap:1.2.5
conda activate syngap # activate the conda environment for SynGAP

Dependence

python >=3.10
biopython >=1.81
jcvi >=1.3.6
bedtools >=2.31.0
last >=1454
emboss >=6.6.0
gffread >=0.12.7
seqkit >=2.4.0
diamond >=2.1.8
perl-bioperl >=1.7.8
kneed >=0.8.3
numpy >=1.26.0
pandas >=2.1.1
matplotlib-base >=3.8.0
scikit-image >=0.22.0
pybedtools >=0.9.0
deap >=1.4.1
more-itertools
crossmap
graphviz
webcolors
ortools-python
ftpretty

Usage

genome annotations polishing

dual

SynGAP dual was a module designed for the mutual gene structural annotations correction of two species, which takes the genome sequences and genome annotations of the correction objects as input. For example:

syngap dual \
--sp1fa=Athaliana_167_TAIR9.fa \
--sp1gff=Athaliana_167_TAIR10.gene.gff3 \
--sp2fa=Arabidopsis_halleri.Ahal2.2.dna.toplevel.fa \
--sp2gff=Arabidopsis_halleri.Ahal2.2.52.gff3 \
--sp1=Ath \
--sp2=Aha

In the results directory, there are several key output files:

| Result File | Description | | --- | --- | | *.SynGAP.gff3 | the full polished genome annotation file (originnal + polished) | | *.SynGAP.clean.gff3 | the polished genome annotation file (only polished) | | *.SynGAP.clean.miss_annotated.gff3 | only the polished annotations that are miss-annotated in the originnal genome annotation | | *.SynGAP.clean.mis_annotated.gff3 | only the polished annotations that are mis-annotated in the originnal genome annotation | | *.anchors.gap | the gaps where mis-annotation or miss-annotation of gene models (MAGs) may exist |

master

You can also chosse to polish the gene structural annotations of one species with the Core set picked up by us. Core set includes several plant and animal species with high quality genome annotation:

| plant | animal | | --- | --- | | Aristolochia fimbriata | Bos taurus | | Arabidopsis thaliana | Caenorhabditis elegans | | Brachypodiumdistachyon | Canis lupus familiaris | | Cucumis sativus | Drosophila melanogaster | | Citrus sinensis | Danio rerio | | Fragaria vesca | Felis catus | | Glycine max | Gallus gallus | | Musa acuminata | Homo sapiens | | Oryza sativa | Mus musculus | | Solanum lycopersicum | Ovis aries | | Vitis vinifera | Pan troglodytes | | Zea mays | Sus scrofa | | | Xenopus tropicalis |

To use SynGAP master, you should first download the database from the link below, which include plant.tar.gz and animal.tar.gz. You can choose the one you need.<br /> https://mega.nz/folder/Fw4gHDyY#LyPPhLheFLHCIAGWN4NsQg<br /> Then import the downloaded database:

syngap initdb \
--sp=plant \
--file=plant.tar.gz

After import the database, run SynGAP master:

syngap master \
--sp=plant \
--sp1fa=Brassica_rapa_ro18.SCU_BraROA_2.3.dna.toplevel.fa \
--sp1gff=Brassica_rapa_ro18.SCU_BraROA_2.3.53.chr.gff3 \
--sp1=Bra

triple

As for the polishing of three species in combination, you can choose SynGAP triple.

syngap triple \
--sp1fa=Athaliana_167_TAIR9.fa \
--sp1gff=Athaliana_167_TAIR10.gene.gff3 \
--sp2fa=Arabidopsis_halleri.Ahal2.2.dna.toplevel.fa \
--sp2gff=Arabidopsis_halleri.Ahal2.2.52.gff3 \
--sp3fa=Brassica_rapa_ro18.SCU_BraROA_2.3.dna.toplevel.fa \
--sp3gff=Brassica_rapa_ro18.SCU_BraROA_2.3.53.chr.gff3 \
--sp1=Ath \
--sp2=Aha \
--sp3=Bra

custom

If you only focus on the annotation polishing in specific synteny block, or prefer to use synteny results from other software rather than jcvi, you can offer the *.anchors file that contains the block and use SynGAP custom.

syngap custom \
--sp1fa=Athaliana_167_TAIR9.fa \
--sp1gff=Athaliana_167_TAIR10.gene.gff3 \
--sp2fa=Arabidopsis_halleri.Ahal2.2.dna.toplevel.fa \
--sp2gff=Arabidopsis_halleri.Ahal2.2.52.gff3 \
--custom_anchors=Ath.Aha.originalid.anchors \
--sp1=Ath \
--sp2=Aha

gene differential expression analysis of related species

SynGAP incorporates another function module, genepair, to generate high-confidence cross-species homologous gene pairs by combining the improved synteny (from SynGAP dual or triple) and best two-way BLAST. And SynGAP evi can adopte another parameter, expression variation index (EVI), which is calculated based on the gene expression level, the difference in expression level, and the difference of the expression trend in a time-series transcriptome data.

genepair

SynGAP genepair takes the genome sequences and genome annotations of the paired objects as input.

syngap genepair \
--sp1fa=Can.fa \
--sp1gff=Can.SynGAP.gff3 \
--sp2fa=Sly.fa \
--sp2gff=Sly.SynGAP.gff3 \
--sp1=Can \
--sp2=Sly

SynGAP genepair will generate several key output files (see below), and ..final.genepair will used in SynGAP evi.

| Result File | Description | | --- | --- | | *.final.genepair | the full gene pairs file (syntenic + best two-way BLAST) | | *.Synteny.genepair | the syntenic gene pairs | | *.2wayblast.genepair | the best two-way BLAST gene pairs |

evi

Base on the gene pairs between two species and the time-series transcriptome data, evi calculates the EVI for each gene pair. The input expression file should be a tab-delimited text file with normalized expression values, including FPKM, RPKM, and TPM (among which we recommend using TPM).

syngap evi \
--genepair=Can.Sly.final.genepair \
--sp1exp=Can.S1_S7.transcript.TPM.xls \
--sp2exp=Sly.S1_S7.transcript.TPM.xls

There are several key output files:

| Result File | Description | | --- | --- | | *.final.genepair.EVI.xls | the final EVI result file, in which the gene pairs are ranked by EVI | | *.final.genepair.EVI.threshold.txt | the threshold of EVI. The gene pairs with EVI exceeding the threshold were considered to show marked differential expression signals | | *.final.genepair.EVI.pdf | the ranked dotplot of EVI for all gene pairs | | *.final.genepair.EVI.indexweight.pdf | the stacked barplot of the three indexes contributing to EVI, which can help to adjust the weight of three indexes | | *.final.genepair.EVI.indexweightratio.pdf | the percentage stacked barplot of the three indexes contributing to EVI, which can help to adjust the weight of three indexes |

eviplot

If you are interested in specific gene pairs, you can highlight them using eviplot.

syngap eviplot \
--EVI=Can.Sly.final.genepair.EVI.xls \
--highlightid=highlight.id \
--outgraph=Can.Sly.highlight.EVI.pdf

The format of highlight.id is like follow:

| GeneID1 | GeneID2 | Label | | --- | --- | --- | | Capana06g001783 | transcript:Solyc06g059840.3.1 | CaBCKDH | | Capana02g002339 | transcript:Solyc02g081745.1.1 | CaAT3 | | Capana04g000751 | transcript:Solyc04g077240.3.1 | CaBCAT |

View on GitHub
GitHub Stars45
CategoryDevelopment
Updated2mo ago
Forks5

Languages

Python

Security Score

85/100

Audited on Jan 14, 2026

No findings