KGist
Knowledge Graph summarization for anomaly/error detection & completion (WebConf '20)
Install / Use
/learn @GemsLab/KGistREADME
KGist: Knowledge Graph Summarization for Anomaly Detection & Completion
Caleb Belth, Xinyi Zheng, Jilles Vreeken, and Danai Koutra. What is Normal, What is Strange, and What is Missing in a Knowledge Graph: Unified Characterization via Inductive Summarization. ACM The Web Conference (WWW), April 2020. [Link to the paper]
If used, please cite:
@inproceedings{belth2020normal,
title={What is Normal, What is Strange, and What is Missing in a Knowledge Graph: Unified Characterization via Inductive Summarization},
author={Belth, Caleb and Zheng, Xinyi and Vreeken, Jilles and Koutra, Danai},
booktitle={Proceedings of The Web Conference 2020},
pages={1115--1126},
year={2020}
}
Presentation: https://youtu.be/Ql7VEfliPXo
Setup
git clone git@github.com:GemsLab/KGist.gitcd data/unzip nell.zipunzip dbpedia.zipcd ../src/cd test/python tester.py
Requirements
Python 3numpyscipynetworkx
Data
Nell and DBpedia are zipped in the data/ directory. Yago is too big to distribute via Github.
{KG_name}.txt format: space separated, one triple per line.
s1 p1 o1
s2 p2 o2
...
{KG_name}_labels.txt format: space separated, one entity per line followed by a variable number of labels, also space separated.
e1 l1 l2 ...
e2 l1 l2 l3 ...
...
Example usage (from src/ dir)
Command Line
python main.py --graph nell
Interface
from graph import Graph
from searcher import Searcher
from model import Model
# load graph
graph = Graph('nell', idify=True)
# create a Searcher object to search for a model (set of rules)
searcher = Searcher(graph)
# build initial model
model = searcher.build_model()
model.print_stats()
# perform rule merging refinement
model = model.merge_rules()
model.print_stats()
# perform rule nesting refinement
model = model.nest_rules()
model.print_stats()
To compute anomaly scores for triples as in Section 4.3:
from anomaly_detector import AnomalyDetector
# construct an anomaly detector with the KGist model
anomaly_detector = AnomalyDetector(model)
# an edge/triple to score
edge = ('concept:company:limited_brands', 'concept:companyceo', 'concept:ceo:leslie_wexner')
anomaly_detector.score_edge(edge)
>>> 26.5164
Larger numbers mean more anomalous. Note that in our experiments in Section 5.2, we used KGist+m, which would be the model without running model.nest_rules().
Arguments
--graph {KG_name} Expects {KG_name}.txt and {KG_name}_labels.txt to be in data/ directory in format as described above for NELL and DBpedia.
--rule_merging / -Rm True/False (Optional; Default = False) Use rule merging refinement (Section 4.2.2)
--rule_nesting / -Rn True/False (Optional; Default = False) Use rule nesting refinement (Section 4.2.2)
--idify / -i True/False (Optional; Default = True) Convert entities and predicates to integer ids internally for faster processing
--verbosity / -v [0, infinity) (Optional; Default = 1,000,000) How frequently to log progress (use integers)
--output_path / -o (Optional; Default = 'output/') What directory to write the output to (log will still be printed to stdout)
Output
output/{KG_name}_model.picklesaves a Model object.output/{KG_name}_model.rulessaves the rules, which are recursively defined, in parenthetical form.
Frequently Asked Questions (FAQ)
I want to run KGist on my own dataset. How did you construct the labels file?
We constructed the labels file by moving the rdf:type triples to the labels file. Thus, if, for example, there are triples (LaRose, rdf:type, book) and (LaRose, rdf:type, novel) in the KG, then LaRose book novel would be a row in the labels file.
Comments or Questions
Contact Caleb Belth with comments or questions: cbelth@umich.edu
Related Skills
node-connect
338.0kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
83.4kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
338.0kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
commit-push-pr
83.4kCommit, push, and open a PR
