|logo|

|pip| |downloads|

|github_action| |ABRA|

What is PheKnowLator?

PheKnowLator (Phenotype Knowledge Translator) or pkt_kg is the first fully customizable knowledge graph (KG) construction framework enabling users to build complex KGs that are Semantic Web compliant and amenable to automatic Web Ontology Language (OWL) reasoning, generate contemporary property graphs, and are importable by today’s popular graph toolkits. Please see the project Wiki <https://github.com/callahantiff/PheKnowLator/wiki>__ for additional information.

📢 Please see our preprint 👉 https://arxiv.org/abs/2307.05727

What Does This Repository Provide?

A Knowledge Graph Sharing Hub: Prebuilt KGs and associated metadata. Each KG is provided as triple edge lists, OWL API-formatted RDF/XML and NetworkX graph-pickled MultiDiGraphs. We also make text files available containing node and relation metadata.
A Knowledge Graph Building Framework: An automated Python 3 library designed for optimized construction of semantically-rich, large-scale biomedical KGs from complex heterogeneous data. The framework also includes Jupyter Notebooks to greatly simplify the generation of required input dependencies.

NOTE. A table listing and describing all output files generated for each build along with example output from each file can be found here <https://github.com/callahantiff/PheKnowLator/wiki/KG-Construction#table-knowledge-graph-build-output>__.

How do I Learn More?

Join and/or start a Discussion_
The Project Wiki_ for available knowledge graphs, pkt_kg data sources, and the knowledge graph construction process
A Zenodo Community <https://zenodo.org/communities/pheknowlator-ecosystem>__ has been established to provide access to software releases, presentations, and preprints related to this project

Releases

Data Access <https://github.com/callahantiff/PheKnowLator/wiki/Archived-Builds>__
Build Documentation <https://github.com/callahantiff/PheKnowLator/wiki/Benchmarks-and-Builds>__

Getting Started

Install Library

This program requires Python version 3.6. To install the library from PyPI <https://pypi.org/project/pkt-kg/>_, run:

.. code:: shell

pip install pkt_kg

You can also clone the repository directly from GitHub by running:

.. code:: shell

git clone https://github.com/callahantiff/PheKnowLator.git

Note. Sometimes OWLTools, which comes with the cloned/forked repository (./pkt_kg/libs/owltools) loses "executable" permission. To avoid any potential issues, I recommend running the following in the terminal from the PheKnowLator directory:

.. code:: shell

chmod +x pkt_kg/libs/owltools

Set-Up Environment

The pkt_kg library requires a specific project directory structure.

If you plan to run the code from a cloned version of this repository, then no additional steps are needed.
If you are planning to utilize the library without cloning the library, please make sure that your project directory matches the following:

.. code:: shell

PheKnowLator/
    |
    |---- resources/
    |         |
    |     construction_approach/
    |         |
    |     edge_data/
    |         |
    |     knowledge_graphs/
    |         |
    |     node_data/
    |         |
    |     ontologies/
    |         |
    |     owl_decoding/
    |         |
    |     relations_data/

Dependencies

Several input documents must be created before the pkt_kg library can be utilized. Each of the input documents are listed below by knowledge graph build step:

DOWNLOAD DATA ^^^^^^^^^^^^^^^^ This code requires three documents within the resources directory to run successfully. For more information on these documents, see Document Dependencies_:

resources/resource_info.txt_
resources/ontology_source_list.txt_
resources/edge_source_list.txt_

For assistance in creating these documents, please run the following from the root directory:

.. code:: bash

python3 generates_dependency_documents.py

Prior to running this step, make sure that all mapping and filtering data referenced in resources/resource_info.txt_ have been created. To generate these data yourself, please see the Data_Preparation.ipynb_ Jupyter Notebook for detailed examples of the steps used to build the v2.0.0 knowledge graph <https://github.com/callahantiff/PheKnowLator/wiki/v2.0.0>__.

Note. To ensure reproducibility, after downloading data, a metadata file is output for the ontologies (ontology_source_metadata.txt) and edge data sources (edge_source_metadata.txt).

CONSTRUCT KNOWLEDGE GRAPH ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The KG Construction_ Wiki page provides a detailed description of the knowledge construction process (please see the knowledge graph README_ for more information). Please make sure the documents listed below are presented in the specified location prior to constructing a knowledge graph. Click on each document for additional information. Note, that cloning this library will include a version of these documents that points to the current build. If you use this version then there is no need to download anything prior to running the program.

resources/construction_approach/subclass_construction_map.pkl_
resources/Master_Edge_List_Dict.json_ ➞ automatically created after edge list construction
resources/node_data/node_metadata_dict.pkl <https://github.com/callahantiff/PheKnowLator/blob/master/resources/node_data/README.md>__ ➞ if adding metadata for new edges to the knowledge graph
resources/knowledge_graphs/PheKnowLator_MergedOntologies*.owl_ ➞ see ontology README_ for information
resources/relations_data/RELATIONS_LABELS.txt_
resources/relations_data/INVERSE_RELATIONS.txt_ ➞ if including inverse relations

Running the pkt Library

pkt_kg can be run via the provided main.py_ script or using the main.ipynb_ Jupyter Notebook or using a Docker container.

Main Script or Jupyter Notebook

The program can be run locally using the main.py_ script or using the main.ipynb_ Jupyter Notebook. An example of the workflow used in both of these approaches is shown below.

.. code:: python

import psutil import ray from pkt import downloads, edge_list, knowledge_graph

initialize ray

ray.init()

determine number of cpus available

available_cpus = psutil.cpu_count(logical=False)

DOWNLOAD DATA

ontology data

ont = pkt.OntData('resources/ontology_source_list.txt') ont.downloads_data_from_url() ont.writes_source_metadata_locally()

edge data sources

edges = pkt.LinkedData('resources/edge_source_list.txt') edges.downloads_data_from_url() edges.writes_source_metadata_locally()

CREATE MASTER EDGE LIST

combined_edges = dict(edges.data_files, **ont.data_files)

initialize edge dictionary class

master_edges = pkt.CreatesEdgeList(data_files=combined_edges, source_file='./resources/resource_info.txt') master_edges.runs_creates_knowledge_graph_edges(source_file'./resources/resource_info.txt', data_files=combined_edges, cpus=available_cpus)

BUILD KNOWLEDGE GRAPH

full build, subclass construction approach, with inverse relations and node metadata, and decode owl

kg = PartialBuild(kg_version='v2.0.0', write_location='./resources/knowledge_graphs', construction='subclass, node_data='yes, inverse_relations='yes', cpus=available_cpus, decode_owl='yes')

kg.construct_knowledge_graph() ray.shutdown()

`main.py`

The example below provides the details needed to run pkt_kg using ./main.py.

.. code:: bash

python3 main.py -h
usage: main.py [-h] [-p CPUS] -g ONTS -e EDG -a APP -t RES -b KG -o OUT -n NDE -r REL -s OWL -m KGM

PheKnowLator: This program builds a biomedical knowledge graph using Open Biomedical Ontologies
and linked open data. The program takes the following arguments:

optional arguments:
-h, --help            show this help message and exit
-p CPUS, --cpus CPUS  # workers to use; defaults to use all available cores
-g ONTS, --onts ONTS  name/path to text file containing ontologies
-e EDG,  --edg EDG    name/path to text file containing edge sources
-a APP,  --app APP    construction approach to use (i.e. instance or subclass
-t RES,  --res RES    name/path to text file containing resource_info
-b KG,   --kg KG      the build, can be "partial", "full", or "post-closure"
-o OUT,  --out OUT    name/path to directory where to write knowledge graph
-r REL,  --rel REL    yes/no - adding inverse relations to knowledge graph
-s OWL,  --owl OWL    yes/no - removing OWL Semantics from knowledge graph

`main.ipynb`

The ./main.ipynb Jupyter notebook provides detailed instructions for how to run the pkt_kg algorithm and build a knowledge graph from scratch.

Docker Container

pkt_kg can be run using a Docker instance. In order to utilize the Dockerized version of the code, please make sure that you have downloaded the newest version of Docker <https://docs.docker.com/get-docker/>__. There are two ways to utilize Docker with this repository:

PheKnowLator

Install / Use

README

What Does This Repository Provide?

How do I Learn More?

Releases

Install Library

Set-Up Environment

Dependencies

Main Script or Jupyter Notebook

initialize ray

determine number of cpus available

DOWNLOAD DATA

ontology data

edge data sources

CREATE MASTER EDGE LIST

initialize edge dictionary class

BUILD KNOWLEDGE GRAPH

full build, subclass construction approach, with inverse relations and node metadata, and decode owl

main.py

main.ipynb

Docker Container

`main.py`

`main.ipynb`