SkillAgentSearch skills...

KROWN

KROWN ๐Ÿ‘‘: A Benchmark for RDF Graph Materialization

Install / Use

/learn @kg-construct/KROWN
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

KROWN ๐Ÿ‘‘: A Benchmark for RDF Graph Materialization

License DOI Website Data generator docs Execution framework docs

KROWN ๐Ÿ‘‘ is a benchmark for materialization systems to construct Knowledge Graphs from (semi-)heterogeneous data sources using declarative mappings such as RML.

Many benchmarks already exist for virtualization systems e.g. GTFS-Madrid-Bench, NPD, BSBM which focus on complex queries with a single declarative mapping. However, materialization systems are unaffected by complex queries since their input is the dataset and the mappings to generate a Knowledge Graph. Some specialized datasets exist to benchmark specific limitations of materialization systems such as duplicated or empty values in datasets e.g. GENOMICS, but they do not cover all aspects of materialization systems. Therefore, it is hard to compare materialization systems among each other in general which is where KROWN ๐Ÿ‘‘ comes in!

Benchmark pipeline

Data generator

KROWN ๐Ÿ‘‘ provides a data generator to scale the different benchmark scenarios with multiple scaling parameters, configurable through a set of JSON configuration files. This way, any combination can be used of scaling parameters and their values are stored to easily reproduce the generation in the future.

KROWN ๐Ÿ‘‘'s data generator is available inside the data-generator folder consisting of scenarios under data-generator/config and unittests to verify the output of the generator (data-generator/tests]. More information can be found in the README.

Installation

KROWN ๐Ÿ‘‘'s data generator requires Numpy and Pandas which are listed in the requirements.txt file of the data-generator directory:

cd data-generator
pip3 install --user -r requirements.txt
  • pandas: data manipulation functions for generating synthetic data
  • numpy: needed by Pandas

Example usage

cd data-generator
./exgentool generate --scenario=/path/to/config.json

Samples

We provide samples of the generated scenarios by KROWN's data generator which use a small data size to visualise the impact of changing the parameter in each scenario, you can find them in the samples folder.

Execution framework

KROWN ๐Ÿ‘‘ provides also an execution framework to reproducible execute benchmark scenarios as a pipeline of Docker containers while measuring metrics e.g. execution time, CPU time, memory consumption, storage usage, etc.

KROWN ๐Ÿ‘‘'s execution framework is available inside the execution-framework folder and unittests to verify the execution of Docker-based pipelines and collection of metrics by KROWN's execution framework (execution-framework/tests]. More information can be found in the README.

Installation

KROWN ๐Ÿ‘‘'s execution framework requires several dependencies which are listed in the requirements.txt file of the execution-framework directory:

cd execution-framework
sudo apt install zlib1g zlib1g-dev libpq-dev libjpeg-dev python3-pip docker.io
pip3 install --user -r requirements.txt
  • psycopg2-binary: PostgreSQL database access
  • pymysql: MySQL database access
  • jsonschema: validation of pipeline description in metadata.json
  • psutil: measurement of CPU and RAM usage
  • requests: checking if a resource is online or posting a SPARQL query to a triplestore
  • rdflib: performing operations on RDF files
  • timeout-decorator: enforcing the timeout on each resource

Example usage

cd execution-framework
./exectool --runs=5 --root=/path/to/scenarios run

The execution framework of KROWN ๐Ÿ‘‘ has been used in the Knowledge Graph Construction Workshop Challenges at ESWC 2023 and 2024. It was also used in benchmarking incremental mappings with IncRML (under review).

Sustainability plan and limitations

A full list of Issues can be found here. Their status is followed up in a GitHub Project by the Knowledge Graph Construction community.

KROWN's data generator and execution framework was created to support newer editions of the Knowledge Graph Construction Workshop Challenge because each edition the community has to add more generators to expand each edition with new challenges. Currently, the community is adding the test cases of the new RML modules to the data generator of the Challenge. After this edition (2024) ends, the generator for the test cases will be added to KROWN's data generator. The community will keep developing KROWN's data generator and execution framework to ease the introduction of new scenarios for newer editions of the Knowledge Graph Construction Workshop Challenge.

See the READMEs of data-generator and execution-framework for more details.

Results

The results of our experiments are available on Zenodo. Below, the figures we created from these results are shown and we explain in detail how these experiments can be reproduced.

Figures

In our paper, we used 2 figures which we include here as well with their corresponding description. First figure shows the results of scaling the number of named graphs via Graph Maps. Second figure how the systems are affected by the different join scenarios.

Figure 1: named graphs

Results for the Graph Maps subscenarios: scaling the number of POMs with 1 Named Graph (NG) (top) and scaling the number of NGs from 5 to 15 Statically (S) and Dynamically (D) in a Subject Map (bottom). RMLMapper always times out, RMLStreamer does not support multiple GMs. SDM-RDFizer fails the multiple GMs with an error. All systems fail or time out the 15NG dynamic case.

Figure 2: joins

Results for join scenarios: number of join duplicates (left), number of join conditions (middle), and join relations N-M (right). RMLStreamer-CSV is excluded from number of join conditions because it does not support multiple join conditions. RMLMapper times out (TO) for 5,10, 15 join conditions.

Reproducing results of ISWC 2024 Resource Track

In this section we discuss our evaluation setup, the materialization systems we evaluated, and the list of scenarios we generated and used to analyze the materialization systems using KROWN. In the Instructions subsection, we explain each step needed to reproduce the experiments.

Evaluation setup

We generated several scenarios using KROWN's data generator and executed them 5 times with KROWN's execution framework. All experiments were performed on Ubuntu 22.04 LTS machines (Linux 5.15.0, x86_64) with each Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz, 48 GB RAM memory, and 2 GB swap memory. The output of each materialization system was set to N-Triples.

Materialization systems

We selected the most popular maintained materialization systems for constructing RDF graphs for performing our experiments with KROWN:

  • RMLMapper
  • RMLStreamer
  • Morph-KGC
  • SDM-RDFizer
  • OntopM (Ontop in materialization mode)

Note: KROWN is flexible and allows adding any other materialization system, see KROWNโ€™s execution framework documentation for more information.

Scenarios

We consider the following scenarios:

  • Raw data: number of rows, columns and cell size
  • Duplicates & empty values: percentage of the data containing duplicates or empty values
  • Mappings: Triples Maps (TM), Predicate Object Maps (POM), Named Graph Maps (NG).
  • Joins: relations (1-N, N-1, N-M), conditions, and duplicates during joins

Note: KROWN is flexible and allows adding any other scenario, see KROWN's data generator documentation for more information.

In the table below we list all parameter values we used to configure our scenarios:

| Scenario | Parameter values | | ------------------------ | -------------------------- | | Raw data: rows | 10K, 100K, 1M, 10M | | Raw data: columns | 1, 10, 20, 30 | | Raw data: cell size | 500, 1K, 5K, 10K | | Duplicates: percentage | 0%, 25%, 50%, 75%, 100% | | Empty values: percentage | 0%, 25%, 50%, 75%, 100% | | Mappings: TMs + 5POMs | 1, 10, 20, 30 TMs | | Mappings: 20TMs + POMs | 1, 3, 5, 10 POMs | | Mappings: NG in SM | 1, 5, 10, 15 NGs | | Mappings: NG in POM | 1, 5, 10, 15 NGs | | Mappings: NG in SM/POM | 1/1, 5/5, 10/10, 15/15 NGs | | Joins: 1-N relations | 1-1

Related Skills

View on GitHub
GitHub Stars5
CategoryDevelopment
Updated19d ago
Forks2

Languages

Scala

Security Score

90/100

Audited on Mar 12, 2026

No findings