SkillAgentSearch skills...

Sesame

[SIGMOD'23] Data Stream Clustering: An In-depth Empirical Study [ICDM'24] MOStream: A Modular and Self-Optimizing Data Stream Clustering Algorithm

Install / Use

/learn @intellistream/Sesame
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

Sesame

version version pyversion os PyPI - License DOI

About

Sesame is scalable stream mining library on modern hardware written in C++

By now Sesame contains several representative real-world stream clustering algorithms and synthetic algorithms

Quick Start

Installation

pip3 install pysame

Python Example

#!python3

from pysame import Benne, Birch, BenneObj

X = [[0, 1], [0.3, 1], [-0.3, 1], [0, -1], [0.3, -1], [-0.3, -1]]

# run birch algorithm
brc = Birch(
    n_clusters=2,
    dim=2,
    distance_threshold=0.5,
)
print(brc.partial_fit(X).predict(X))

# run benne algorithm
bne = Benne(
    n_clusters=2,
    dim=2,
    distance_threshold=0.5,
    obj=BenneObj.accuracy,
)
print(bne.partial_fit(X).predict(X))

Build Sesame

Prerequisites

Checkout Source Code

git clone https://github.com/intellistream/Sesame --recursive --depth=1
cd Sesame

Build

mkdir build
cd build
cmake .. -DCMAKE_BUILD_TYPE=Release
make -j$(nproc)

Run Tests

Download the datasets from Zenodo and put them in the datasets directory:

cd Sesame/datasets
pip3 install zenodo_get
zenodo_get 8210331

Run the tests:

cd Sesame/build/test
./google_test

Real-world algorithms

| Algorithm | Window Model | Outlier Detection | Summarizing Data Structure | Offline Refinement | | ---------- | ------------ | ----------------- | -------------------------- | ------------------ | | BIRCH | LandmarkWM | OutlierD | CFT | ❌ | | CluStream | LandmarkWM | OutlierD-T | MCs | ✅ | | DenStream | DampedWM | OutlierD-BT | MCs | ✅ | | DStream | DampedWM | OutlierD-T | Grids | ❌ | | StreamKM++ | LandmarkWM | NoOutlierD | CoreT | ✅ | | DBStream | DampedWM | OutlierD-T | MCs | ✅ | | EDMStream | DampedWM | OutlierD-BT | DPT | ❌ | | SL-KMeans | SlidingWM | NoOutlierD | AMS | ❌ |

Synthetic algorithms

| Algorithm | Window Model | Outlier Detection | Summarizing Data Structure | Offline Refinement | | ---------- | ---------------------------- | ----------------- | --------------------------| -------------------| | G1 | LandmarkWM | OutlierD | MCs | ✅ | | G2 | LandmarkWM | OutlierD | MCs | ✅ | | G3 | LandmarkWM | OutlierD | CFT | ❌ | | G4 | SlidingWM | OutlierD | MCs | ❌ | | G5 | DampedWM | OutlierD-B | MCs | ❌ | | G6 | LandmarkWM | NoOutlierD | MCs | ❌ | | G8 | LandmarkWM | OutlierD | MCs | ❌ | | G9 | LandmarkWM | OutlierD | Grids | ❌ | | G10 | LandmarkWM | OutlierD | DPT | ❌ | | G11 | LandmarkWM | OutlierD-T | MCs | ❌ | | G12 | LandmarkWM | OutlierD-B | MCs | ❌ | | G13 | LandmarkWM | OutlierD-BT | MCs | ❌ | | G14 | LandmarkWM | OutlierD | AMS | ❌ | | G15 | LandmarkWM | OutlierD | CoreT | ❌ |

Datasets

| DataSet | Length | Dimension | Cluster Number | | --------- | ------------------------------------- | --------- | -------------- | | CoverType | 581012 | 54 | 7 | | KDD-99 | 4898431 | 41 | 23 | | Insects | 905145 | 33 | 24 | | Sensor | 2219803 | 5 | 55 | | EDS | 45690, 100270, 150645, 200060, 245270 | 2 | 75, 145, 218, 289, 363 | | ODS | 94720,97360,100000 | 2 | 90, 90, 90 |

Datasets can download from zenodo: https://zenodo.org/records/8210331

How to Cite Sesame

  • [SIGMOD 2023] Xin Wang and Zhengru Wang and Zhenyu Wu and Shuhao Zhang and Xuanhua Shi and Li Lu. Data Stream Clustering: An In-depth Empirical Study, SIGMOD, 2023
@inproceedings{wang2023sesame,
	title        = {Data Stream Clustering: An In-depth Empirical Study},
	author       = {Xin Wang and Zhengru Wang and Zhenyu Wu and Shuhao Zhang and Xuanhua Shi and Li Lu},
	year         = 2023,
	booktitle    = {Proceedings of the 2023 International Conference on Management of Data (SIGMOD)},
	location     = {Seattle, WA, USA},
	publisher    = {Association for Computing Machinery},
	address      = {New York, NY, USA},
	series       = {SIGMOD '23},
	abbr         = {SIGMOD},
	bibtex_show  = {true},
	selected     = {true},
	pdf          = {papers/Sesame.pdf},
	code         = {https://github.com/intellistream/Sesame},
	doi	         = {10.1145/3589307},
    url          = {https://doi.org/10.1145/3589307}
}
View on GitHub
GitHub Stars26
CategoryDevelopment
Updated5mo ago
Forks6

Languages

C++

Security Score

87/100

Audited on Oct 28, 2025

No findings