Pykeen
🤖 A Python library for learning and evaluating knowledge graph embeddings
Install / Use
/learn @pykeen/PykeenREADME
Installation

The latest stable version of PyKEEN requires Python 3.9+. It can be downloaded and installed from PyPI with:
pip install pykeen
The latest version of PyKEEN can be installed directly from the source code on GitHub with:
pip install git+https://github.com/pykeen/pykeen.git
More information about installation (e.g., development mode, Windows installation, Colab, Kaggle, extras) can be found in the installation documentation.
Quickstart 
This example shows how to train a model on a dataset and test on another dataset.
The fastest way to get up and running is to use the pipeline function. It provides a high-level entry into the extensible functionality of this package. The following example shows how to train and evaluate the TransE model on the Nations dataset. By default, the training loop uses the stochastic local closed world assumption (sLCWA) training approach and evaluates with rank-based evaluation.
from pykeen.pipeline import pipeline
result = pipeline(
model='TransE',
dataset='nations',
)
The results are returned in an instance of the PipelineResult dataclass that has attributes for the trained model, the training loop, the evaluation, and more. See the tutorials on using your own dataset, understanding the evaluation, and making novel link predictions.
PyKEEN is extensible such that:
- Each model has the same API, so anything from
pykeen.modelscan be dropped in - Each training loop has the same API, so
pykeen.training.LCWATrainingLoopcan be dropped in - Triples factories can be generated by the user with
from pykeen.triples.TriplesFactory
The full documentation can be found at https://pykeen.readthedocs.io.
Implementation
Below are the models, datasets, training modes, evaluators, and metrics implemented
in pykeen.
Datasets
The following 37 datasets are built in to PyKEEN. The citation for each dataset corresponds to either the paper describing the dataset, the first paper published using the dataset with knowledge graph embedding models, or the URL for the dataset if neither of the first two are available. If you want to use a custom dataset, see the Bring Your Own Dataset tutorial. If you have a suggestion for another dataset to include in PyKEEN, please let us know here.
| Name | Documentation | Citation | Entities | Relations | Triples |
|------------------------------------|---------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------|------------|-------------|-----------|
| Aristo-v4 | pykeen.datasets.AristoV4 | Chen et al., 2021 | 42016 | 1593 | 279425 |
| BioKG | pykeen.datasets.BioKG | Walsh et al., 2019 | 105524 | 17 | 2067997 |
| Clinical Knowledge Graph | pykeen.datasets.CKG | Santos et al., 2020 | 7617419 | 11 | 26691525 |
| CN3l Family | pykeen.datasets.CN3l | Chen et al., 2017 | 3206 | 42 | 21777 |
| CoDEx (large) | pykeen.datasets.CoDExLarge | Safavi et al., 2020 | 77951 | 69 | 612437 |
| CoDEx (medium) | pykeen.datasets.CoDExMedium | Safavi et al., 2020 | 17050 | 51 | 206205 |
| CoDEx (small) | pykeen.datasets.CoDExSmall | Safavi et al., 2020 | 2034 | 42 | 36543 |
| ConceptNet | pykeen.datasets.ConceptNet | Speer et al., 2017 | 28370083 | 50 | 34074917 |
| Countries | pykeen.datasets.Countries | Bouchard et al., 2015 | 271 | 2 | 1158 |
| Commonsense Knowledge Graph | pykeen.datasets.CSKG | Ilievski et al., 2020 | 2087833 | 58 | 4598728 |
| DB100K | pykeen.datasets.DB100K | Ding et al., 2018 | 99604 | 470 | 697479 |
| DBpedia50 | pykeen.datasets.DBpedia50 | Shi et al., 2017 | 24624 | 351 | 34421 |
| Drug Repositioning Knowledge Graph | [`py
