OpenEA
A Benchmarking Study of Embedding-based Entity Alignment for Knowledge Graphs, VLDB 2020
Install / Use
/learn @nju-websoft/OpenEAREADME
A Benchmarking Study of Embedding-based Entity Alignment for Knowledge Graphs
Entity alignment seeks to find entities in different knowledge graphs (KGs) that refer to the same real-world object. Recent advancement in KG embedding impels the advent of embedding-based entity alignment, which encodes entities in a continuous embedding space and measures entity similarities based on the learned embeddings. In this paper, we conduct a comprehensive experimental study of this emerging field. This study surveys 23 recent embedding-based entity alignment approaches and categorizes them based on their techniques and characteristics. We further observe that current approaches use different datasets in evaluation, and the degree distributions of entities in these datasets are inconsistent with real KGs. Hence, we propose a new KG sampling algorithm, with which we generate a set of dedicated benchmark datasets with various heterogeneity and distributions for a realistic evaluation. This study also produces an open-source library, which includes 12 representative embedding-based entity alignment approaches. We extensively evaluate these approaches on the generated datasets, to understand their strengths and limitations. Additionally, for several directions that have not been explored in current approaches, we perform exploratory experiments and report our preliminary findings for future studies. The benchmark datasets, open-source library and experimental results are all accessible online and will be duly maintained.
Key contributors ✨
<table> <tbody> <tr> <td align="center" valign="top" width="14.28%"><a href="https://sunzequn.github.io/"><img src="https://sunzequn.github.io/homepage/profile.jpg" width="100px;" alt="Zequn Sun"/><br /><b>Zequn Sun (NJU)</b></a><br /></td> <td align="center" valign="top" width="14.28%"><a href="http://ws.nju.edu.cn/~whu"><img src="http://ws.nju.edu.cn/wiki/attach/Wei%20Hu/me5.jpeg" width="100px;" alt="Wei Hu (NJU)"/><br /><b>Wei Hu (NJU)</b></a><br /></td> <td align="center" valign="top" width="14.28%"><a href="https://muhaochen.github.io/"><img src="https://muhaochen.github.io/index_files/kemper_courtyard.png" width="100px;" alt="Muhao Chen (NJU)"/><br /><b>Muhao Chen (UC Davis)</b></a><br /></td> <td align="center" valign="top" width="14.28%"><a href="https://tjdi.tongji.edu.cn/TeacherDetail.do?id=4991&lang=_en"><img src="https://tjdi.tongji.edu.cn/uploadfile/201909/16/1222111845.png" width="90px;" alt="Haofen Wang (TONGJI)"/><br /><b>Haofen Wang (TONGJI)</b></a><br /></td> </tr> </tbody> </table>*** UPDATE ***
-
Aug. 1, 2021: We release the source code for entity alignment with dangling cases.
-
June 29, 2021: We release the DBP2.0 dataset for entity alignment with dangling cases.
-
Jan. 8, 2021: The results of AliNet on OpenEA datasets are avaliable at Google docs.
-
Nov. 30, 2020: We release a new version (v2.0) of the OpenEA dataset, where the URIs of DBpedia and YAGO entities are encoded to resovle the name bias issue. It is strongly recommended to use the v2.0 dataset for evaluating attribute-based entity alignment methods, such that the results can better reflect the robustness of these methods in real-world situation.
-
Sep. 24, 2020: add AliNet.
Table of contents
- Library for Embedding-based Entity Alignment
- KG Sampling Method and Datasets
- Experiment and Results
- License
- Citation
Library for Embedding-based Entity Alignment
Overview
We use Python and Tensorflow to develop an open-source library, namely OpenEA, for embedding-based entity alignment. The software architecture is illustrated in the following Figure.
<p> <img width="70%" src="https://github.com/nju-websoft/OpenEA/blob/master/docs/stack.png" /> </p>The design goals and features of OpenEA include three aspects, i.e., loose coupling, functionality and extensibility, and off-the-shelf solutions.
-
Loose coupling. The implementations of embedding and alignment modules are independent to each other. OpenEA provides a framework template with pre-defined input and output data structures to make the three modules as an integral pipeline. Users can freely call and combine different techniques in these modules.
-
Functionality and extensibility. OpenEA implements a set of necessary functions as its underlying components, including initialization functions, loss functions and negative sampling methods in the embedding module; combination and learning strategies in the interaction mode; as well as distance metrics and alignment inference strategies in the alignment module. On top of those, OpenEA also provides a set of flexible and high-level functions with configuration options to call the underlying components. In this way, new functions can be easily integrated by adding new configuration options.
-
Off-the-shelf solutions. To facilitate the use of OpenEA in diverse scenarios, we try our best to integrate or re-build a majority of existing embedding-based entity alignment approaches. Currently, OpenEA has integrated the following embedding-based entity alignment approaches:
- MTransE: Multilingual Knowledge Graph Embeddings for Cross-lingual Knowledge Alignment. IJCAI 2017.
- IPTransE: Iterative Entity Alignment via Joint Knowledge Embeddings. IJCAI 2017.
- JAPE: Cross-Lingual Entity Alignment via Joint Attribute-Preserving Embedding. ISWC 2017.
- KDCoE: Co-training Embeddings of Knowledge Graphs and Entity Descriptions for Cross-lingual Entity Alignment. IJCAI 2018.
- BootEA: Bootstrapping Entity Alignment with Knowledge Graph Embedding. IJCAI 2018.
- GCN-Align: Cross-lingual Knowledge Graph Alignment via Graph Convolutional Networks. EMNLP 2018.
- AttrE: Entity Alignment between Knowledge Graphs Using Attribute Embeddings. AAAI 2019.
- IMUSE: Unsupervised Entity Alignment Using Attribute Triples and Relation Triples. DASFAA 2019.
- SEA: Semi-Supervised Entity Alignment via Knowledge Graph Embedding with Awareness of Degree Difference. WWW 2019.
- RSN4EA: Learning to Exploit Long-term Relational Dependencies in Knowledge Graphs. ICML 2019.
- MultiKE: Multi-view Knowledge Graph Embedding for Entity Alignment. IJCAI 2019.
- RDGCN: Relation-Aware Entity Alignment for Heterogeneous Knowledge Graphs. IJCAI 2019.
- AliNet: Knowledge Graph Alignment Network with Gated Multi-hop Neighborhood Aggregation. AAAI 2020.
-
OpenEA has also integrated the following relationship embedding models and two attribute embedding models (AC2Vec and Label2vec ) in the embedding module:
- TransH: Knowledge Graph Embedding by Translating on Hyperplanes. AAAI 2014.
- TransR: Learning Entity and Relation Embeddings for Knowledge Graph Completion. AAAI 2015.
- TransD: Knowledge Graph Embedding via Dynamic Mapping Matrix. ACL 2015.
- HolE: Holographic Embeddings of Knowledge Graphs. AAAI 2016.
- ProjE: [ProjE: Embedding Projection for Knowledge Graph Completion](https://www.aaai.org/ocs/ind
