Multiprefixspan
Implementation of Prefixspan for multiple items
Install / Use
/learn @RonaldYu/MultiprefixspanREADME
Prefixspan
MultiPrefixspan is to implement prefixspan algorithm for a sequence with multiple-item events.
Installation
Using pip:
pip install multiprefixspan
Example for single event
Given data as follows, each element is a sequence.
db = [
[0, 1, 2, 3, 4],
[1, 1, 1, 3, 4],
[2, 1, 2, 2, 0],
[1, 1, 1, 2, 2],
]
Execute prefixspan algorithm to find frequent patterns with the minimum support greater than or equal to 2 on db by:
from multiprefixspan import prefixspan_one_item_one_event
pre_object = prefixspan_one_item_one_event(db, 2)
pre_object.exect()
Show result:
pre_object.found_patterns
{'[0]': 2,
'[1]': 4,
'[1, 2]': 3,
'[1, 2, 2]': 2,
'[1, 3]': 2,
'[1, 3, 4]': 2,
'[1, 4]': 2,
'[1, 1]': 2,
'[1, 1, 1]': 2,
'[2]': 3,
'[2, 2]': 2,
'[3]': 2,
'[3, 4]': 2,
'[4]': 2}
Example for multiple events
Given data as follows, each element is a sequence, each element in a sequence may have more than one items.
db = [[{'a'}, {'a', 'b', 'c'}, {'a', 'c'}, {'d'}, {'c', 'f'}],
[{'a', 'd'}, {'c'}, {'b', 'c'}, {'a', 'e'}],
[{'e', 'f'}, {'a', 'b'}, {'d', 'f'}, {'c'}, {'b'}],
[{'e', 'g'}, {'a', 'f'}, {'c'}, {'b'}, {'c'}],
[{'a'}, {'g', 'h'}, {'f', 'e'}, {'b', 'c', 'e'}, {'a', 'b', 'c'}]
]
Execute prefixspan algorithm to find frequent patterns with the minimum support greater than or equal to 2 on db by:
from multiprefixspan import prefixspan_multiple_items_one_event
pre_object = prefixspan_multiple_items_one_event(db, 2)
pre_object.exect()
Show the first ten patterns (unordered):
pre_object.found_patterns[:10]
[([{'f'}], 4),
([{'f'}, {'a'}], 2),
([{'f'}, {'a', 'b'}], 2),
([{'e', 'f'}], 2),
([{'e', 'f'}, {'a'}], 2),
([{'e', 'f'}, {'a', 'b'}], 2),
([{'e', 'f'}, {'b'}], 2),
([{'e', 'f'}, {'b'}, {'b'}], 2)]
Please refer to Demonstrate.ipynb for real data example.
Reference
- Pei, J., Han, J., Mortazavi-Asl, B., Pinto, H., Chen, Q., Dayal, U., & Hsu, M. (2001). PrefixSpan,: mining sequential patterns efficiently by prefix-projected pattern growth. Proceedings 17th International Conference on Data Engineering, 215-224.
- Real data BMS1_spmf.txt from http://www.philippe-fournier-viger.com/spmf/index.php?link=datasets.php; BMSWebView1 (Gazelle) ( KDD CUP 2000). This dataset contains 59,601 sequences of clickstream data from an e-commerce. It contains 497 distinct items. The average length of sequences is 2.42 items with a standard deviation of 3.22. In this dataset, there are some long sequences. For example, 318 sequences contains more than 20 items.
Related Skills
node-connect
339.5kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
83.9kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
339.5kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
commit-push-pr
83.9kCommit, push, and open a PR
