Oggmap

oggmap is a python package to extract orthologous maps (short: orthomap or in other words the evolutionary age of a given orthologous group) from OrthoFinder/eggNOG results. Oggmap results (gene ages per orthologous group) can be further used to calculate weigthed expression data (transcriptome evolutionary index) from scRNA sequencing objects.

Generate Convert Improve

Install / Use

/learn @kullrich/Oggmap

About this skill

Quality Score

0/100

README

oggmap

orthologous maps - evolutionary age index

oggmap is a python package to extract orthologous maps (short: orthomap or in other words the evolutionary age of a given orthologous group) from OrthoFinder or eggNOG results. Oggmap results (gene ages per orthologous group) can be further used to calculate and visualize weighted expression data (transcriptome evolutionary index) from scRNA sequencing objects.

oggmap steps zebrafish example nematode example

Documentation

Online documentation can be found here.

When using oggmap in published research, please cite:

Ullrich KK, Glytnasi NE, "oggmap: a Python package to extract gene ages per orthogroup and link them with single-cell RNA data", Bioinformatics, 2023, 39(11). https://doi.org/10.1093/bioinformatics/btad657

Installing `oggmap`

More installation options can be found here.

oggmap installation using conda and pip

We recommend installing oggmap in an independent conda environment to avoid dependent software conflicts. Please make a new python environment for oggmap and install dependent libraries in it.

If you do not have a working installation of Python 3.10 (or later), consider installing Anaconda or Miniconda.

To create and activate the environment run:

$ git clone https://github.com/kullrich/oggmap.git
$ cd oggmap
$ conda env create --file environment.yml
$ conda activate oggmap_env

Then to install oggmap via PyPI:

$ pip install oggmap

Quick usage

Detailed tutorials how to use oggmap can be found here.

Update/download local ncbi taxonomic database:

The following command downloads or updates your local copy of the NCBI's taxonomy database (~150MB). The database is saved at -dbname set to default taxadb.sqlite.

$ oggmap ncbitax -u -outdir taxadb -type taxa -dbname taxadb.sqlite
$ rm -rf taxadb

>>> from oggmap import ncbitax
>>> update_parser = ncbitax.define_parser()
>>> update_args = update_parser.parse_args()
>>> update_args.outdir = 'taxadb'
>>> update_args.dbname = 'taxadb.sqlite'
>>> ncbitax.update_ncbi(update_args)

Step 1 - Get query species taxonomic lineage information:

You can query a species lineage information based on its name or its taxID. For example Danio rerio with taxID 7955:

$ oggmap qlin -q "Danio rerio" -dbname taxadb.sqlite
$ oggmap qlin -qt 7955 -dbname taxadb.sqlite

>>> from oggmap import qlin
>>> qlin.get_qlin(q='Danio rerio',
...     dbname = 'taxadb.sqlite')
>>> qlin.get_qlin(qt='7955',
...     dbname = 'taxadb.sqlite')

You can get the query species topology as a tree. For example for Danio rerio with taxID 7955:

>>> from io import StringIO
>>> from Bio import Phylo
>>> from oggmap import qlin
>>> query_topology = qlin.get_lineage_topo(qt='7955',
...     dbname='taxadb.sqlite')
>>> output = StringIO()
>>> Phylo.write(query_topology, output, "newick")
>>> output.getvalue().strip()

Step 2 - Get query species orthomap from OrthoFinder results:

The following code extracts the orthomap for Danio rerio based on pre-calculated OrthoFinder results and ensembl release-113:

OrthoFinder results (-S diamond_ultra_sens) using translated, longest-isoform coding sequences from ensembl release-113 have been archived and can be found here.

# download OrthoFinder example:
$ wget https://zenodo.org/records/14680521/files/ensembl_113_orthofinder_last_Orthogroups.GeneCount.tsv.zip
$ wget https://zenodo.org/records/14680521/files/ensembl_113_orthofinder_last_Orthogroups.tsv.zip
$ wget https://zenodo.org/records/14680521/files/ensembl_113_orthofinder_last_species_list.tsv    

# extract orthomap:
$ oggmap of2orthomap -seqname 7955.danio_rerio.pep -qt 7955 \\
  -sl ensembl_113_orthofinder_last_species_list.tsv \\
  -oc ensembl_113_orthofinder_last_Orthogroups.GeneCount.tsv.zip \\
  -og ensembl_113_orthofinder_last_Orthogroups.tsv.zip \\
  -dbname taxadb.sqlite

>>> from oggmap import datasets, of2orthomap, qlin
>>> datasets.ensembl113_last(datapath='.')
>>> query_orthomap, orthofinder_species_list, of_species_abundance = of2orthomap.get_orthomap(
...     seqname='7955.danio_rerio.pep',
...     qt='7955',
...     sl='ensembl_113_orthofinder_last_species_list.tsv',
...     oc='ensembl_113_orthofinder_last_Orthogroups.GeneCount.tsv.zip',
...     og='ensembl_113_orthofinder_last_Orthogroups.tsv.zip',
...     out=None,
...     quiet=False,
...     continuity=True,
...     overwrite=True,
...     dbname='taxadb.sqlite')
>>> query_orthomap

Step 3 - Map OrthoFinder gene names and scRNA gene/transcript names:

The following code extracts the gene to transcript table for Danio rerio:

GTF file obtained from here.

# to get GTF from Mus musculus on Linux run:
$ wget https://ftp.ensembl.org/pub/release-113/gtf/mus_musculus/Mus_musculus.GRCm39.113.chr.gtf.gz
# on Mac:
$ curl https://ftp.ensembl.org/pub/release-113/gtf/mus_musculus/Mus_musculus.GRCm39.113.chr.gtf.gz --remote-name

# create t2g from GTF:
$ oggmap gtf2t2g -i Mus_musculus.GRCm39.113.chr.gtf.gz \\
  -o Mus_musculus.GRCm39.113.chr.gtf.t2g.tsv \\
  -g -b -p -v -s

>>> from oggmap import datasets, gtf2t2g
>>> gtf_file = datasets.zebrafish_ensembl113_gtf(datapath='.')
>>> query_species_t2g = gtf2t2g.parse_gtf(
...     gtf=gtf_file,
...     g=True, b=True, p=True, v=True, s=True, q=True)
>>> query_species_t2g

Import now, the scRNA dataset of the query species.

example: Danio rerio - http://tome.gs.washington.edu (Qui et al. 2022)

AnnData file can be found here.

>>> import scanpy as sc
>>> from oggmap import datasets, orthomap2tei
>>> # download zebrafish scRNA data here: https://doi.org/10.5281/zenodo.7243602
>>> # or download with datasets.qiu22_zebrafish(datapath='.')
>>> zebrafish_data = datasets.qiu22_zebrafish(datapath='.')
>>> zebrafish_data
>>> # check overlap of transcript table <gene_id> and scRNA data <var_names>
>>> orthomap2tei.geneset_overlap(zebrafish_data.var_names, query_species_t2g['gene_id'])

The replace_by helper function can be used to add a new column to the orthomap dataframe by matching e.g. gene isoform names and their corresponding gene names.

>>> # convert orthomap transcript IDs into GeneIDs and add them to orthomap
>>> query_orthomap['geneID'] = orthomap2tei.replace_by(
...    x_orig = query_orthomap['seqID'],
...    xmatch = query_species_t2g['transcript_id_version'],
...    xreplace = query_species_t2g['gene_id'])
>>> # check overlap of orthomap <geneID> and scRNA data
>>> orthomap2tei.geneset_overlap(zebrafish_data.var_names, query_orthomap['geneID'])

Step 4 - Get transcriptome evolutionary index (TEI) values and add them to scRNA dataset:

Since now the gene names correspond to each other in the orthomap and the scRNA adata object, one can calculate the transcriptome evolutionary index (TEI) and add them to the scRNA dataset (adata object).

>>> # add TEI values to existing adata object
>>> orthomap2tei.get_tei(adata = zebrafish_data,
...    gene_id = query_orthomap['geneID'],
...    gene_age = query_orthomap['PSnum'],
...    keep = 'min',
...    layer = None,
...    add = True,
...    obs_name = 'tei',
...    boot = False,
...    bt = 10,
...    normalize_total = False,
...    log1p = False,
...    target_sum = 1e6)

Step 5 - Downstream analysis

Once the gene age data has been added to the scRNA dataset, one can e.g. plot the corresponding transcriptome evolutionary index (TEI) values by any given observation pre-defined in the scRNA dataset.

Boxplot TEI per stage:

>>>sc.pl.violin(adata = zebrafish_data,
...     keys = ['tei'],
...     groupby = 'stage',
...     rotation = 90,
...     palette = 'Paired',
...     stripplot = False,
...     inner = 'box')

oggmap via Command Line

oggmap can also be used via the command line.

Command line documentation can be found here.

$ oggmap -h

usage: oggmap <sub-command>

oggmap

options:
  -h, --help            show this help message and exit

sub-commands:
  {cds2aa,gtf2t2g,ncbitax,of2orthomap,orthomcl2orthomap,plaza2orthomap,qlin}
                        sub-commands help
    cds2aa              translate CDS to AA and optional reta

Related Skills

node-connect

341.8k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

84.6k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

341.8k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

commit-push-pr

84.6k

Commit, push, and open a PR

kullrich

View profile

View on GitHub

GitHub Stars4

CategoryDevelopment

Updated12mo ago

Forks0

kullrich/oggmap

Languages

Python

Security Score

82/100

Audited on Apr 2, 2025

No findings