GraphZip
Mining graph streams using dictionary-based compression
Install / Use
/learn @cpacker/GraphZipREADME
GraphZip: Mining Graph Streams using Dictionary-based Compression
GraphZip is a scalable method for mining interesting patterns in graph streams, based on the Lempel-Ziv class of compression algorithms.
This repository contains a reference implementation of the GraphZip algorithm, written in Python.
For more detailed information regarding the algorithm, see our paper.
Install
Python 3 and python-igraph are required.
Once you have a working version of Python 3, install python-igraph with:
$ pip install python-igraph
Note: python-igraph is not to be confused with the igraph package (renamed to jgraph).
Usage
Run GraphZip directly from the command line with:
$ python graphzip.py graph_file [-n NUM_FILES] [-d] [-a ALPHA] [-t THETA] [-o OUTFILE]
Examples
# Run GraphZip on `test.graph` with a batch size of `5` and a dictionary size of `10`:
python graphzip.py test.graph -a 5 -t 10
# Run GraphZip on files `1.graph` through `100.graph` located in directory `test_graphs/`, using a batch size of 5 and the default dictionary size:
python graphzip.py test_graphs -n 100 -a 5
Options
-n, --num_files
Using -n turns on multi-file mode, and GraphZip will treat graph_file as a directory holding NUM_FILES sequential graph stream files, labelled 1.graph to [NUM_FILES].graph.
-d, --directed
Using -d will make GraphZip read the graph file(s) as directed graphs.
-a, --alpha
-t, --theta
Batch size and dictionary size, respectively (hyperparameters of the GraphZip model).
-o, --outfile
By default, the pattern dictionary is dumped to stdout; use -o to save the dictionary to a specified file.
File format
The correct format for .graph files is:
% '%' to start comments
v [vertex id (int)] [vertex label (int)]
e [source vertex id (int)] [target vertex id (int)] [edge label (int)]
For example:
% example 3-clique with vertex labels "100", "999" and edge labels "1", "2", "3"
v 1 100
v 2 999
v 3 100
e 1 2 1
e 1 3 2
e 2 3 3
Vertices must be defined prior to being referenced by an edge (or the vertex label would be unknown).
In the case of processing a graph stream over sequential .graph files, having Compressor's label_history_per_file property set to False and add_implicit_vertices set to True (the default) allows edges to reference vertices declared in previous files. For further details see the inline comments.
Experiments
Several example experiments are located in the unit tests directory under tests/test_expr.py.
Run the algorithm on several example graphs (located in data/) by navigating to the root project directory then running:
$ python -m unittest
To run a specific test use the format:
$ python -m unittest test.[test_file].[TestSuite].[test_function]
For example, to run the 4-clique test with 20% coverage located in the TestGraphZipSubgen suite:
$ python -m unittest test.test_expr.TestGraphZipSubgen.test_4PATH_20
For details on specific examples, see the files under the test/ directory.
Citations
If you find GraphZip useful in your research, please consider citing:
C. Packer and L. Holder, GraphZip: Mining Graph Streams using Dictionary-based Compression. SIGKDD Workshop on Mining and Learning in Graphs (MLG), August 2017.
@inproceedings{PacHol:mlg17,
title = {{GraphZip}: Mining Graph Streams using Dictionary-based Compression},
author = {Charles Packer and Larry Holder},
booktitle = {Proceedings of the 13th International Workshop on Mining and Learning with Graphs (MLG)},
year = {2017}
}
License
MIT License - see LICENSE.txt for details.
Related Skills
node-connect
336.9kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
83.0kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
336.9kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
commit-push-pr
83.0kCommit, push, and open a PR
