SkillAgentSearch skills...

Clubmark

Clubmark: a Parallel Isolation Framework for Benchmarking and Profiling of Clustering (Community Detection) Algorithms Considering Overlaps (Covers)

Install / Use

/learn @eXascaleInfolab/Clubmark

README

Clubmark (former PyCABeM) - Bench bench mark icon marking Framework for the Clustering Algorithms Evaluation

\brief Benchmarking and real-time profiling of the [overlapping] clustering algorithms working on graphs specified by pairwise relations (community detection algorithms). The quality evaluation is performed using extrinsic (all existing accuracy measures applicable for overlapping clustering algorithms on large datasets: Mean F1 Score family, Normalized Mutual Information(NMI) and Omega Index / fuzzy ARI family) and some intrinsic (Modularity(Q) and Conductance(f)) measures considering overlaps/covers (shared node membership by multiple clusters [on the same resolution level]) and multiple resolutions (the same node can be a full member of a cluster and its super-clusters).
\authors (c) Artem Lutov artem@exascale.info
\license Apache 2.0, optional commercial support and relicensing are provided by the request (write me)

The evaluating clustering algorithms (located in algorithms/) have own, possibly proprietary licenses, they are a) not a part of and b) not covered by the terms of the Clubmark framework.

\organizations eXascale Infolab, Lumais, ScienceWise
\keywords overlapping clustering benchmarking, community detection benchmarking, algorithms benchmarking framework.

\cite Clubmark: a Parallel Isolation Framework for Benchmarking and Profiling Clustering Algorithms on NUMA Architectures

@inproceedings{Clb18,
	author={Artem Lutov and Mourad Khayati and Philippe Cudr{\'e}-Mauroux},
	title={Clubmark: a Parallel Isolation Framework for Benchmarking and Profiling Clustering Algorithms on NUMA Architectures},
	booktitle={2018 IEEE International Conference on Data Mining Workshops (ICDMW)},
	year={2018},
	pages = {1481--1486},
	doi = {10.1109/ICDMW.2018.00212},
	keywords={benchmarking framework, clustering evaluation, parallel benchmarking, algorithm profiling, community detection benchmarking, constraint-aware load-balancing}
}

Content

Fast Start Tutorial

On Linux Ubuntu 16.04 LTS x64:

  1. Clone this repository: $ git clone https://github.com/eXascaleInfolab/clubmark.git
  2. Install dependencies ($ install_reqs.sh) and optionally fetch the SNAP networks with ground-truth communities (described in install_reqs.sh).
  3. Prepare system environment for the load balancer: $ prepare_hostenv.sh
  4. Run the benchmark from the cloned repository: $ python3 ./benchmark.py -w=0.0.0.0:8080 -g=3%3 -i%3=realnets/*.nse -a="CggciRg Daoc DaocAR DaocX LouvainIg Pscan Scd CggcRg DaocA DaocR Ganxis Oslom2 Randcommuns Scp" -r -q="Xmeasures -fh -o -s" -s -t=8h --runtimeout=15d 1>> ./bench.log 2>> ./bench.err
    • -w=0.0.0.0:8080 - deploy profiling Web UI at 0.0.0.0:8080
    • -g=3%3 - generate 4 instances of each kind of the synthetic network (weighted undirected graph with overlapping clusters in the ground-truth) and additionally shuffle (reorder links and nodes of) each instance 3 times
    • -i%3=realnets/*.nse - additionally process [real-world] networks from ./realnets (the networks should be put there in advance, see ./prepare_snapdata.sh)
    • -a=... - use specified clustering algorithms
    • -r - run (execute) the clustering algorithms (specified with -a, otherwise all available), produces sets (hierarchy / multiple levels of) clusters and execution profiles in ./results/<algname>
    • -q=... - evaluate produced clusters of the specified algorithms (-a) with the specified quality measures, produces (raw) evaluation results in ./results/qmeasures/qmeasures_*.h5

      Use utils/hdf5ToTxt.py script to convert qmeasures/qmeasures_*.h5 files to the textual representation (some HDF5 viewers do not support the latest HDF5 storage drivers used here)

    • -s - aggregate and summarize the quality evaluation results, produces ./results/qmeasures/aggqms_*.h5 from ./results/qmeasures/qmeasures_*.h5
    • -t=8h - limit execution time for each algorithm / evaluation measure on each dataset to 8 hours
    • --runtimeout=15d - limit execution time for the whole benchmarking to 15 days
    • 1>> ... output execution log (stdout tracing) to the ./bench.log
    • 2>> ... output execution log (stderr tracing) to the ./bench.err
  5. Look at the evaluation results in the ./results/ directory

To run the benchmark on other [POSIX-compatible] platfroms / operating systems, either the docker container for the default environment should be used, or all clustering algorithms and evaluating measures should be recompiled for the target platform.

Overview

The benchmark executes specified applications (clustering algorithms) on the specified or generated input datasets (networks - graphs specified by the pairwise relations), measures execution statistics and evaluates quality (including accuracy) of the results using specified measures. Clubmark is a general-purpose modular benchmarking framework specialized for the clustering (community detection) algorithms evaluation. The generic functionality is based on PyExPool multiprocess execution pool and load balancer. <!-- (the only another open-source load balancer I'm aware about, which has sufficient features to power Clubmark is [Slurm Workload Manager](https://slurm.schedmd.com/overview.html)). -->

Generic properties:

  • Data preprocessing (synthetic networks generation, shuffling, etc.);
  • Pre/post execution callbacks for each executable to perform the application specific initialization and finalization (results normalization, logs post-processing, etc.);
  • Execution of the specified applications on the specified input datasets;
  • Resource consumption management and tracing:
    • Specification of the global and per executable constraints for the maximal execution time and memory consumption;
    • Binding of any executable to the specified set of logical CPUs (affinity masking) to employ advantages of the NUMA hardware executing apps in parallel not affecting their CPU caches (dedicating L1/2/3 caches);
    • Tracing of the execution time, CPU timings (user and kernel) and peak RAM RSS memory consumption;
  • Load balancing of the executing apps to employ all available hardware resources respecting the specified constraints;
  • Resilience and fault tolerance (failover execution), where crashes of some running applications does not affect execution of other applications and other scheduled jobs associated with the crashed application.

Properties specific for the clustering algorithms benchmarking:

  • Synthetic networks generation with the extended LFR Framework and subsequent shuffling (network nodes and links reordering);
  • Execution of the clustering algorithms on the specified networks;
  • Evaluation of the intrinsic (modularity, conductance) and extrinsic quality / accuracy (mean F1-Score family, NMIs and Omega Index / Fuzzy Adjusted Rand Index) measures for the generalized clustering algorithms (considering overlaps and multiple resolutions/scales if any);
  • Selection of the best result for the parameterized clustering algorithms among the results produced for the specified variations of the parameters;
  • Evaluation of both the average value and deviation of the quality measures. The deviation is evaluated if multiple instances and/or shuffles (nodes and links reordering) of the input networks are present.

Executing algorithms and evaluation measures with the accessory processing tools are described in the respective pages.

Clubmark Demo Poster for ICDM18: Clubmark_Poster-w1024

Motivation

I have not found any open source cross-platform framework for the [efficient] execution and evaluation of custom applications having significant variation of the time/memory complexity and custom constraints, so decided to write the own one.

Particularly, I had to evaluate various clustering (community detection) algorithms on large networks using specific measures. The main challenges are the following:

  • the executing applications (clustering algorithms) being benchmarked have very different (orders of magnitude) time and memory complexity and represented by the applications implemented on various languages (mainly C++, Java, C and Python);
  • the target datasets are very different by structure and size (orders of magnitude varying from KBs to GBs);
  • the accessory evaluating applications (NMI, F1-Scores, Omega Index) have very different (orders of magnitude) time and memory complexity and architecture (single-threaded and multi-threaded applications).

Ideally, the applications should be executed in parallel in a way to guarantee that they:

  • are not swapped out from RAM (which happens on the unconstrained execution of multiple memory demanding apps) to not affect the efficiency measurements;
  • co
View on GitHub
GitHub Stars23
CategoryDevelopment
Updated3mo ago
Forks2

Languages

Python

Security Score

77/100

Audited on Dec 6, 2025

No findings