FrequentSubgraphMining
This is a python implementation of the Gaston frequent subgraph mining algorithm.
Install / Use
/learn @ColinConduff/FrequentSubgraphMiningREADME
Gaston Graph Mining with Python This is a python implementation of the Gaston graph mining algorithm.
Author: Colin Conduff
"Gaston finds all frequent subgraphs by using a level-wise approach in which first simple paths are considered, then more complex trees and finally the most complex cyclic graphs. It appears that in practice most frequent graphs are not actually very complex structures; Gaston uses this quickstart observation to organize the search space efficiently. To determine the frequency of graphs, Gaston employs an occurrence list based approach in which all occurrences of a small set of graphs are stored in main memory."
- Siegfried Nijssen (Gaston author)
Software Requirements:
- python 3.2 or later
- networkx 1.11
- matplotlib
Installation:
cd gaston_py
pip install .
Command Line Interface:
For usage instructions, use the following command.
gaston -h
Examples:
gaston 0.95 test_files/medium_chemical.txt -o output_files/ -c -t
gaston 6 test_files/medium_chemical.txt
gaston 2 test_files/Chemical_340.txt
Notes:
- Support is defined as frequency(subgraph) / count(graphs). See reference [1] below for details.
- If an output directory is provided:
- Frequent subgraphs are drawn using matplotlib and saved under [output folder]/graphs/.
- A
line_graph.txtfile is generated containing the frequent subgraphs in Line Graph format.
- Test files are available in the
test_filesdirectory. The Chemical_340 dataset was obtained from Nijssen and Kok's website.
Run Tests:
python setup.py test
Performance:
- This Python implementation is significantly slower than the original implementation in C++ by Nijssen and Kok, as well as the Java implementation by the ParSeMis library.
- It is also much slower than gSpan impelementations in Python. This is likely due to the use of an occurrence list, which tracks all subgraphs that have been seen to ensure the correctness of the algorithm.
References: [1] Siegfried Nijssen and Joost Kok. A Quickstart in Frequent Structure Mining Can Make a Difference. Proceedings of the SIGKDD, 2004. http://liacs.leidenuniv.nl/~nijssensgr/gaston/index.html
Additional Resources: Java implementation of the Gaston algorithm https://www2.cs.fau.de/EN/research/zold/ParSeMiS/index.html https://github.com/timtadh/parsemis
Related Skills
node-connect
339.5kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
83.9kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
339.5kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
commit-push-pr
83.9kCommit, push, and open a PR
