SkillAgentSearch skills...

Xmeasures

Extremely fast evaluation of the extrinsic clustering measures: various (mean) F1 measures and Omega Index (Fuzzy Adjusted Rand Index) for the multi-resolution clustering with overlaps/covers, standard NMI, clusters labeling

Install / Use

/learn @eXascaleInfolab/Xmeasures

README

xmeasures - Extrinsic Clustering Measures

Extremely fast evaluation of accuracy (extrinsic quality) measures for the [overlapping/fuzzy] clusterings (collections of groups of items):
family of [mean] F1 measures (including Average F1-Score) and Omega Index (fuzzy version of the Adjusted Rand Index) for overlapping multi-resolution clusterings with unequal node base (and optional node base synchronization) using various matching policies (micro, macro and combined weighting),
and standard NMI for non-overlapping clustering on a single resolution. xmeasures also provides clusters labeling with the indices of the ground-truth clusters considering 1:n match and evaluating F1, precision and recall of the labeled clusters.

xmeasures evaluates F1 and NMI for collections of hundreds thousands [overlapping] clusters (covers, communities) withing a dozen seconds on an ordinary laptop using a single CPU core. The computational time is O(N) <!-- O(N \* 2 \* s), where *s* is the average sharing ratio (membership) of the nodes, typically -> 1. --> unlike O(N * C) <!-- O(N \* (C + C')) for the average F1-score --> of the existing state of the art implementations, where N is the number of nodes in the network and C is the number of clusters. Computational complexity for Omega Index is standard and equals O(N^2 * s/2), where s is the average sharing ratio (membership) of the nodes, typically s -> 1.
xmeasures is one of the utilities designed for the PyCaBeM clustering benchmark to evaluate clusterings of large networks.

The paper: Accuracy Evaluation of Overlapping and Multi-resolution Clustering Algorithms on Large Datasets

@inproceedings{Xms19,
	author={Artem Lutov and Mourad Khayati and Philippe Cudr{\'e}-Mauroux},
	title={Accuracy Evaluation of Overlapping and Multi-resolution Clustering Algorithms on Large Datasets},
	booktitle={6th IEEE International Conference on Big Data and Smart Computing (BigComp 2019)},
	year={2019},
	keywords={accuracy metrics, overlapping community evaluation, multi-resolution clustering evaluation, Generalized NMI, Omega Index, MF1, similarity of collections of sets}
}

Related papers about the implemented measures:

  • Omega Index (fuzzy version of the Adjusted Rand Index), which equal to ARI when applied for the non-overlapping clusterings;
  • Mean F1 measures: F1a (Average F1-Score), F1p is much more indicative and discriminative than the presented there F1a but the respective paper has not been published yet;
  • NMI measure.

    Standard NMI is implemented considering overlapping and multi-resolution clustering only to demonstrate non-applicability of the standard NMI for such cases, where it yields unfair results. See GenConvNMI for the fair generalized NMI evaluation.

The execution time and the total processing time (relative power consumption) of xmeasures on a single CPU core vs ParallelComMetric on multiple SMP cores evaluated on the SNAP DBLP dataset and shown in the log scale demonstrates that xmeasures evaluates F1 family measures multiple orders of magnitude faster than other state-of-the-art solutions: Clubmark_Poster-w1024

Author: (c) Artem Lutov artem@exascale.info

Content

Deployment

The target platform is NIX/Posix, the binary is compiled for Linux Ubuntu x64 and also should work on Windows 10+ x64 (see details in this article).

Requirements

There are no any requirements for the execution or compilation except the standard C++ library.

To run the prebuilt executable on Linux Ubuntu 16.04 x64, the standard library can be installed by: $ sudo apt-get install libstdc++6.

Compilation

Application Compilation

$ make release

The following build errors might occur on some platforms and should be resolved as outlined.

  • If your default compiler is g++/gcc < 5.x, then g++-5 or higher should be installed and Makefile might need to be edited replacing g++, gcc with g++-5, gcc-5.
  • -fstack-clash-protection compiler flag is added since xmeasures v4.0.5, which might not be supported by Clang/LLVM and GCC < 8.2. This flag just hardens the application against some stack overflow attacks and should be excluded from the Makefile if not supported on your platform.

To update/extend the input parameters, modify args.ggo and run GenerateArgparser.sh (calls gengetopt) before running make. To install gengetopt, execute: $ sudo apt-get install gengetopt.

Library Compilation

Some core functionality of xmeasures is available as a library with C API, making it possible to link the library from Python and other scripting languages.
The interface is defined in include/interface_c.h.
To build the library, execute:

$ make -f Makefile_lib release

Usage

Execution Options:

$ ../xmeasures -h
xmeasures 4.0.4

Extrinsic measures evaluation: Omega Index (a fuzzy version of the Adjusted
Rand Index, identical to the Fuzzy Rand Index) and [mean] F1-score (prob, harm
and avg) for the overlapping multi-resolution clusterings, and standard NMI for
the non-overlapping clustering on a single resolution. Unequal node base is
allowed in the evaluating clusterings and optionally can be synchronized
removing nodes from the clusters missed in one of the clusterings
(collections).

Usage: xmeasures [OPTIONS] clustering1 clustering2

  clustering  - input file, collection of the clusters to be evaluated.
  
Examples:
  $ ./xmeasures -fp -kc networks/5K25.cnl tests/5K25_l0.825/5K25_l0.825_796.cnl
  $ ./xmeasures -fh -kc -i tests/5K25.cll -ph -l networks/5K25.cnl
tests/5K25_l0.825/5K25_l0.825_796.cnl
  $ ./xmeasures -ox tests/clsevalsx/omega_c4.3-1.cnl
tests/clsevalsx/omega_c4.3-2.cnl


Extrinsic measures are evaluated, i.e. two input clusterings (collections of
clusters) are compared to each other. Optionally, a labeling of the evaluating
clusters with the specified ground-truth clusters is performed.
NOTE:
  - Multiple evaluating measures can be specified.
  - Each cluster should contain unique members, which is ensured only if the
'unique' option is specified.
  - All clusters should be unique to not affect Omega Index evaluation, which
can be ensured by the [resmerge](https://github.com/eXascaleInfolab/resmerge)
utility.
  - Non-corrected unequal node base in the clusterings is allowed, it penalizes
the match.Use [OvpNMI](https://github.com/eXascaleInfolab/OvpNMI) or
[GenConvNMI](https://github.com/eXascaleInfolab/GenConvNMI) for NMI evaluation
in the arbitrary collections (still each cluster should contain unique
members).

Evaluating measures are:
  - OI  - Omega Index (a fuzzy version of the Adjusted Rand Index, identical to
the Fuzzy Rand Index), which yields the same value as Adjusted Rand Index when
applied to the non-overlapping clusterings.
  - [M]F1  - various [mean] F1 measures of the Greatest (Max) Match including
the Average F1-Score (suggested by J. Leskovec) with the optional weighting.
NOTE: There are 3 matching policies available for each kind of F1. The most
representative evaluation is performed by the F1p with combined matching
policy (considers both micro and macro weighting).
  - NMI  - Normalized Mutual Information, normalized by either max or also
sqrt, avg and min information content denominators.
ATTENTION: This is a standard NMI, which should be used ONLY for the HARD
partitioning evaluation (non-overlapping clustering on a single resolution).
It penalizes overlapping and multi-resolution structures.


  -h, --help                    Print help and exit
  -V, --version                 Print version and exit
  -O, --ovp                     evaluate overlapping instead of the
                                  multi-resolution clusters, where max matching
                                  for any shared member between R overlapping
                                  clusters is 1/R (the member is shared)
                                  instead of 1 (the member fully belongs to
                                  each [hierarchical  sub]group) for the member
                                  belonging to R distinct clusters on R
                                  resolutions.
                                  NOTE: It has no effect for the Omega Index
                                  evaluation.  (default=off)
  -q, --unique                  ensure on loading that all cluster members are
                                  unique by removing all duplicates.
                                  (default=off)
  -s, --sync=filename           synchronize with the specified node base
                                  omitting the non-matching nodes.
                                  NOTE: The node base can be either a separate,
                                  or an evaluating CNL file, in the latter case
                                  this option should precede the evaluating
                                  filename not repeating it
  -m, --membership=FLOAT        average expected membership of the nodes in the
                                  clusters, > 0, typic

Related Skills

View on GitHub
GitHub Stars22
CategoryDevelopment
Updated9mo ago
Forks7

Languages

C++

Security Score

87/100

Audited on Jun 29, 2025

No findings