<img src="docs/source/logo.png" height="150"> <h1 align="center"> PyKEEN </h1> <a href="https://github.com/pykeen/pykeen/actions/workflows/common.yml"> <img src="https://github.com/pykeen/pykeen/actions/workflows/common.yml/badge.svg" alt="GitHub Actions"> </a> <a href='https://opensource.org/licenses/MIT'> <img src='https://img.shields.io/badge/License-MIT-blue.svg' alt='License'/> </a> <a href="https://zenodo.org/badge/latestdoi/242672435"> <img src="https://zenodo.org/badge/242672435.svg" alt="DOI"> </a> <a href="https://optuna.org"> <img src="https://img.shields.io/badge/Optuna-integrated-blue" alt="Optuna integrated" height="20"> </a> <a href="https://pytorchlightning.ai"> <img src="https://img.shields.io/badge/-Lightning-792ee5?logo=pytorchlightning&logoColor=white" alt="PyTorch Lightning"> </a> <a href="https://github.com/astral-sh/ruff"> <img src="https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json" alt="Ruff" style="max-width:100%;"> </a> <a href=".github/CODE_OF_CONDUCT.md"> <img src="https://img.shields.io/badge/Contributor%20Covenant-2.1-4baaaa.svg" alt="Contributor Covenant"> </a> PyKEEN (Python KnowlEdge EmbeddiNgs) is a Python package designed to train and evaluate knowledge graph embedding models (incorporating multi-modal information). <a href="#installation">Installation</a> • <a href="#quickstart">Quickstart</a> • <a href="#datasets">Datasets (37)</a> • <a href="#inductive-datasets">Inductive Datasets (5)</a> • <a href="#models">Models (40)</a> • <a href="#supporters">Support</a> • <a href="#citation">Citation</a>

Installation

The latest stable version of PyKEEN requires Python 3.9+. It can be downloaded and installed from PyPI with:

pip install pykeen

The latest version of PyKEEN can be installed directly from the source code on GitHub with:

pip install git+https://github.com/pykeen/pykeen.git

More information about installation (e.g., development mode, Windows installation, Colab, Kaggle, extras) can be found in the installation documentation.

Quickstart

This example shows how to train a model on a dataset and test on another dataset.

The fastest way to get up and running is to use the pipeline function. It provides a high-level entry into the extensible functionality of this package. The following example shows how to train and evaluate the TransE model on the Nations dataset. By default, the training loop uses the stochastic local closed world assumption (sLCWA) training approach and evaluates with rank-based evaluation.

from pykeen.pipeline import pipeline

result = pipeline(
    model='TransE',
    dataset='nations',
)

The results are returned in an instance of the PipelineResult dataclass that has attributes for the trained model, the training loop, the evaluation, and more. See the tutorials on using your own dataset, understanding the evaluation, and making novel link predictions.

PyKEEN is extensible such that:

Each model has the same API, so anything from pykeen.models can be dropped in
Each training loop has the same API, so pykeen.training.LCWATrainingLoop can be dropped in
Triples factories can be generated by the user with from pykeen.triples.TriplesFactory

The full documentation can be found at https://pykeen.readthedocs.io.

Implementation

Below are the models, datasets, training modes, evaluators, and metrics implemented in pykeen.

Datasets

The following 37 datasets are built in to PyKEEN. The citation for each dataset corresponds to either the paper describing the dataset, the first paper published using the dataset with knowledge graph embedding models, or the URL for the dataset if neither of the first two are available. If you want to use a custom dataset, see the Bring Your Own Dataset tutorial. If you have a suggestion for another dataset to include in PyKEEN, please let us know here.

| Name | Documentation | Citation | Entities | Relations | Triples | |------------------------------------|---------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------|------------|-------------|-----------| | Aristo-v4 | pykeen.datasets.AristoV4 | Chen et al., 2021 | 42016 | 1593 | 279425 | | BioKG | pykeen.datasets.BioKG | Walsh et al., 2019 | 105524 | 17 | 2067997 | | Clinical Knowledge Graph | pykeen.datasets.CKG | Santos et al., 2020 | 7617419 | 11 | 26691525 | | CN3l Family | pykeen.datasets.CN3l | Chen et al., 2017 | 3206 | 42 | 21777 | | CoDEx (large) | pykeen.datasets.CoDExLarge | Safavi et al., 2020 | 77951 | 69 | 612437 | | CoDEx (medium) | pykeen.datasets.CoDExMedium | Safavi et al., 2020 | 17050 | 51 | 206205 | | CoDEx (small) | pykeen.datasets.CoDExSmall | Safavi et al., 2020 | 2034 | 42 | 36543 | | ConceptNet | pykeen.datasets.ConceptNet | Speer et al., 2017 | 28370083 | 50 | 34074917 | | Countries | pykeen.datasets.Countries | Bouchard et al., 2015 | 271 | 2 | 1158 | | Commonsense Knowledge Graph | pykeen.datasets.CSKG | Ilievski et al., 2020 | 2087833 | 58 | 4598728 | | DB100K | pykeen.datasets.DB100K | Ding et al., 2018 | 99604 | 470 | 697479 | | DBpedia50 | pykeen.datasets.DBpedia50 | Shi et al., 2017 | 24624 | 351 | 34421 | | Drug Repositioning Knowledge Graph | [`py

Pykeen

Install / Use

README

Installation

Quickstart

Implementation

Datasets