KROWN

KROWN 👑: A Benchmark for RDF Graph Materialization

Generate Convert Improve

Install / Use

/learn @kg-construct/KROWN

About this skill

Quality Score

0/100

README

KROWN 👑: A Benchmark for RDF Graph Materialization

KROWN 👑 is a benchmark for materialization systems to construct Knowledge Graphs from (semi-)heterogeneous data sources using declarative mappings such as RML.

Many benchmarks already exist for virtualization systems e.g. GTFS-Madrid-Bench, NPD, BSBM which focus on complex queries with a single declarative mapping. However, materialization systems are unaffected by complex queries since their input is the dataset and the mappings to generate a Knowledge Graph. Some specialized datasets exist to benchmark specific limitations of materialization systems such as duplicated or empty values in datasets e.g. GENOMICS, but they do not cover all aspects of materialization systems. Therefore, it is hard to compare materialization systems among each other in general which is where KROWN 👑 comes in!

Benchmark pipeline

Data generator

KROWN 👑 provides a data generator to scale the different benchmark scenarios with multiple scaling parameters, configurable through a set of JSON configuration files. This way, any combination can be used of scaling parameters and their values are stored to easily reproduce the generation in the future.

KROWN 👑's data generator is available inside the data-generator folder consisting of scenarios under data-generator/config and unittests to verify the output of the generator (data-generator/tests]. More information can be found in the README.

Installation

KROWN 👑's data generator requires Numpy and Pandas which are listed in the requirements.txt file of the data-generator directory:

cd data-generator
pip3 install --user -r requirements.txt

pandas: data manipulation functions for generating synthetic data
numpy: needed by Pandas

Example usage

cd data-generator
./exgentool generate --scenario=/path/to/config.json

Samples

We provide samples of the generated scenarios by KROWN's data generator which use a small data size to visualise the impact of changing the parameter in each scenario, you can find them in the samples folder.

Execution framework

KROWN 👑 provides also an execution framework to reproducible execute benchmark scenarios as a pipeline of Docker containers while measuring metrics e.g. execution time, CPU time, memory consumption, storage usage, etc.

KROWN 👑's execution framework is available inside the execution-framework folder and unittests to verify the execution of Docker-based pipelines and collection of metrics by KROWN's execution framework (execution-framework/tests]. More information can be found in the README.

Installation

KROWN 👑's execution framework requires several dependencies which are listed in the requirements.txt file of the execution-framework directory:

cd execution-framework
sudo apt install zlib1g zlib1g-dev libpq-dev libjpeg-dev python3-pip docker.io
pip3 install --user -r requirements.txt

psycopg2-binary: PostgreSQL database access
pymysql: MySQL database access
jsonschema: validation of pipeline description in metadata.json
psutil: measurement of CPU and RAM usage
requests: checking if a resource is online or posting a SPARQL query to a triplestore
rdflib: performing operations on RDF files
timeout-decorator: enforcing the timeout on each resource

Example usage

cd execution-framework
./exectool --runs=5 --root=/path/to/scenarios run

The execution framework of KROWN 👑 has been used in the Knowledge Graph Construction Workshop Challenges at ESWC 2023 and 2024. It was also used in benchmarking incremental mappings with IncRML (under review).

Sustainability plan and limitations

A full list of Issues can be found here. Their status is followed up in a GitHub Project by the Knowledge Graph Construction community.

KROWN's data generator and execution framework was created to support newer editions of the Knowledge Graph Construction Workshop Challenge because each edition the community has to add more generators to expand each edition with new challenges. Currently, the community is adding the test cases of the new RML modules to the data generator of the Challenge. After this edition (2024) ends, the generator for the test cases will be added to KROWN's data generator. The community will keep developing KROWN's data generator and execution framework to ease the introduction of new scenarios for newer editions of the Knowledge Graph Construction Workshop Challenge.

See the READMEs of data-generator and execution-framework for more details.

Results

The results of our experiments are available on Zenodo. Below, the figures we created from these results are shown and we explain in detail how these experiments can be reproduced.

Figures

In our paper, we used 2 figures which we include here as well with their corresponding description. First figure shows the results of scaling the number of named graphs via Graph Maps. Second figure how the systems are affected by the different join scenarios.

Figure 1: named graphs

Results for the Graph Maps subscenarios: scaling the number of POMs with 1 Named Graph (NG) (top) and scaling the number of NGs from 5 to 15 Statically (S) and Dynamically (D) in a Subject Map (bottom). RMLMapper always times out, RMLStreamer does not support multiple GMs. SDM-RDFizer fails the multiple GMs with an error. All systems fail or time out the 15NG dynamic case.

Figure 2: joins

Results for join scenarios: number of join duplicates (left), number of join conditions (middle), and join relations N-M (right). RMLStreamer-CSV is excluded from number of join conditions because it does not support multiple join conditions. RMLMapper times out (TO) for 5,10, 15 join conditions.

Reproducing results of ISWC 2024 Resource Track

In this section we discuss our evaluation setup, the materialization systems we evaluated, and the list of scenarios we generated and used to analyze the materialization systems using KROWN. In the Instructions subsection, we explain each step needed to reproduce the experiments.

Evaluation setup

We generated several scenarios using KROWN's data generator and executed them 5 times with KROWN's execution framework. All experiments were performed on Ubuntu 22.04 LTS machines (Linux 5.15.0, x86_64) with each Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz, 48 GB RAM memory, and 2 GB swap memory. The output of each materialization system was set to N-Triples.

Materialization systems

We selected the most popular maintained materialization systems for constructing RDF graphs for performing our experiments with KROWN:

RMLMapper
RMLStreamer
Morph-KGC
SDM-RDFizer
OntopM (Ontop in materialization mode)

Note: KROWN is flexible and allows adding any other materialization system, see KROWN’s execution framework documentation for more information.

Scenarios

We consider the following scenarios:

Raw data: number of rows, columns and cell size
Duplicates & empty values: percentage of the data containing duplicates or empty values
Mappings: Triples Maps (TM), Predicate Object Maps (POM), Named Graph Maps (NG).
Joins: relations (1-N, N-1, N-M), conditions, and duplicates during joins

Note: KROWN is flexible and allows adding any other scenario, see KROWN's data generator documentation for more information.

In the table below we list all parameter values we used to configure our scenarios:

| Scenario | Parameter values | | ------------------------ | -------------------------- | | Raw data: rows | 10K, 100K, 1M, 10M | | Raw data: columns | 1, 10, 20, 30 | | Raw data: cell size | 500, 1K, 5K, 10K | | Duplicates: percentage | 0%, 25%, 50%, 75%, 100% | | Empty values: percentage | 0%, 25%, 50%, 75%, 100% | | Mappings: TMs + 5POMs | 1, 10, 20, 30 TMs | | Mappings: 20TMs + POMs | 1, 3, 5, 10 POMs | | Mappings: NG in SM | 1, 5, 10, 15 NGs | | Mappings: NG in POM | 1, 5, 10, 15 NGs | | Mappings: NG in SM/POM | 1/1, 5/5, 10/10, 15/15 NGs | | Joins: 1-N relations | 1-1

Related Skills

node-connect

343.1k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

90.0k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

343.1k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

qqbot-media

343.1k

QQBot 富媒体收发能力。使用 <qqmedia> 标签，系统根据文件扩展名自动识别类型（图片/语音/视频/文件）。