Xmeasures
Extremely fast evaluation of the extrinsic clustering measures: various (mean) F1 measures and Omega Index (Fuzzy Adjusted Rand Index) for the multi-resolution clustering with overlaps/covers, standard NMI, clusters labeling
Install / Use
/learn @eXascaleInfolab/XmeasuresREADME
xmeasures - Extrinsic Clustering Measures
Extremely fast evaluation of accuracy (extrinsic quality) measures for the [overlapping/fuzzy] clusterings (collections of groups of items):
family of [mean] F1 measures (including Average F1-Score) and Omega Index (fuzzy version of the Adjusted Rand Index) for overlapping multi-resolution clusterings with unequal node base (and optional node base synchronization) using various matching policies (micro, macro and combined weighting),
and standard NMI for non-overlapping clustering on a single resolution. xmeasures also provides clusters labeling with the indices of the ground-truth clusters considering 1:n match and evaluating F1, precision and recall of the labeled clusters.
xmeasures evaluates F1 and NMI for collections of hundreds thousands [overlapping] clusters (covers, communities) withing a dozen seconds on an ordinary laptop using a single CPU core. The computational time is O(N) <!-- O(N \* 2 \* s), where *s* is the average sharing ratio (membership) of the nodes, typically -> 1. -->
unlike O(N * C) <!-- O(N \* (C + C')) for the average F1-score -->
of the existing state of the art implementations, where N is the number of nodes in the network and C is the number of clusters. Computational complexity for Omega Index is standard and equals O(N^2 * s/2), where s is the average sharing ratio (membership) of the nodes, typically s -> 1.
xmeasures is one of the utilities designed for the PyCaBeM clustering benchmark to evaluate clusterings of large networks.
The paper: Accuracy Evaluation of Overlapping and Multi-resolution Clustering Algorithms on Large Datasets
@inproceedings{Xms19,
author={Artem Lutov and Mourad Khayati and Philippe Cudr{\'e}-Mauroux},
title={Accuracy Evaluation of Overlapping and Multi-resolution Clustering Algorithms on Large Datasets},
booktitle={6th IEEE International Conference on Big Data and Smart Computing (BigComp 2019)},
year={2019},
keywords={accuracy metrics, overlapping community evaluation, multi-resolution clustering evaluation, Generalized NMI, Omega Index, MF1, similarity of collections of sets}
}
Related papers about the implemented measures:
- Omega Index (fuzzy version of the Adjusted Rand Index), which equal to ARI when applied for the non-overlapping clusterings;
- Mean F1 measures: F1a (Average F1-Score), F1p is much more indicative and discriminative than the presented there F1a but the respective paper has not been published yet;
- NMI measure.
Standard NMI is implemented considering overlapping and multi-resolution clustering only to demonstrate non-applicability of the standard NMI for such cases, where it yields unfair results. See GenConvNMI for the fair generalized NMI evaluation.
The execution time and the total processing time (relative power consumption) of xmeasures on a single CPU core vs ParallelComMetric on multiple SMP cores evaluated on the SNAP DBLP dataset and shown in the log scale demonstrates that xmeasures evaluates F1 family measures multiple orders of magnitude faster than other state-of-the-art solutions:

Author: (c) Artem Lutov artem@exascale.info
Content
Deployment
The target platform is NIX/Posix, the binary is compiled for Linux Ubuntu x64 and also should work on Windows 10+ x64 (see details in this article).
Requirements
There are no any requirements for the execution or compilation except the standard C++ library.
To run the prebuilt executable on Linux Ubuntu 16.04 x64, the standard library can be installed by: $ sudo apt-get install libstdc++6.
Compilation
Application Compilation
$ make release
The following build errors might occur on some platforms and should be resolved as outlined.
- If your default compiler is g++/gcc < 5.x, then
g++-5or higher should be installed andMakefilemight need to be edited replacingg++,gccwithg++-5,gcc-5. -fstack-clash-protectioncompiler flag is added sincexmeasures v4.0.5, which might not be supported by Clang/LLVM and GCC < 8.2. This flag just hardens the application against some stack overflow attacks and should be excluded from theMakefileif not supported on your platform.
To update/extend the input parameters, modify args.ggo and run GenerateArgparser.sh (calls gengetopt) before running make. To install gengetopt, execute: $ sudo apt-get install gengetopt.
Library Compilation
Some core functionality of xmeasures is available as a library with C API, making it possible to link the library from Python and other scripting languages.
The interface is defined in include/interface_c.h.
To build the library, execute:
$ make -f Makefile_lib release
Usage
Execution Options:
$ ../xmeasures -h
xmeasures 4.0.4
Extrinsic measures evaluation: Omega Index (a fuzzy version of the Adjusted
Rand Index, identical to the Fuzzy Rand Index) and [mean] F1-score (prob, harm
and avg) for the overlapping multi-resolution clusterings, and standard NMI for
the non-overlapping clustering on a single resolution. Unequal node base is
allowed in the evaluating clusterings and optionally can be synchronized
removing nodes from the clusters missed in one of the clusterings
(collections).
Usage: xmeasures [OPTIONS] clustering1 clustering2
clustering - input file, collection of the clusters to be evaluated.
Examples:
$ ./xmeasures -fp -kc networks/5K25.cnl tests/5K25_l0.825/5K25_l0.825_796.cnl
$ ./xmeasures -fh -kc -i tests/5K25.cll -ph -l networks/5K25.cnl
tests/5K25_l0.825/5K25_l0.825_796.cnl
$ ./xmeasures -ox tests/clsevalsx/omega_c4.3-1.cnl
tests/clsevalsx/omega_c4.3-2.cnl
Extrinsic measures are evaluated, i.e. two input clusterings (collections of
clusters) are compared to each other. Optionally, a labeling of the evaluating
clusters with the specified ground-truth clusters is performed.
NOTE:
- Multiple evaluating measures can be specified.
- Each cluster should contain unique members, which is ensured only if the
'unique' option is specified.
- All clusters should be unique to not affect Omega Index evaluation, which
can be ensured by the [resmerge](https://github.com/eXascaleInfolab/resmerge)
utility.
- Non-corrected unequal node base in the clusterings is allowed, it penalizes
the match.Use [OvpNMI](https://github.com/eXascaleInfolab/OvpNMI) or
[GenConvNMI](https://github.com/eXascaleInfolab/GenConvNMI) for NMI evaluation
in the arbitrary collections (still each cluster should contain unique
members).
Evaluating measures are:
- OI - Omega Index (a fuzzy version of the Adjusted Rand Index, identical to
the Fuzzy Rand Index), which yields the same value as Adjusted Rand Index when
applied to the non-overlapping clusterings.
- [M]F1 - various [mean] F1 measures of the Greatest (Max) Match including
the Average F1-Score (suggested by J. Leskovec) with the optional weighting.
NOTE: There are 3 matching policies available for each kind of F1. The most
representative evaluation is performed by the F1p with combined matching
policy (considers both micro and macro weighting).
- NMI - Normalized Mutual Information, normalized by either max or also
sqrt, avg and min information content denominators.
ATTENTION: This is a standard NMI, which should be used ONLY for the HARD
partitioning evaluation (non-overlapping clustering on a single resolution).
It penalizes overlapping and multi-resolution structures.
-h, --help Print help and exit
-V, --version Print version and exit
-O, --ovp evaluate overlapping instead of the
multi-resolution clusters, where max matching
for any shared member between R overlapping
clusters is 1/R (the member is shared)
instead of 1 (the member fully belongs to
each [hierarchical sub]group) for the member
belonging to R distinct clusters on R
resolutions.
NOTE: It has no effect for the Omega Index
evaluation. (default=off)
-q, --unique ensure on loading that all cluster members are
unique by removing all duplicates.
(default=off)
-s, --sync=filename synchronize with the specified node base
omitting the non-matching nodes.
NOTE: The node base can be either a separate,
or an evaluating CNL file, in the latter case
this option should precede the evaluating
filename not repeating it
-m, --membership=FLOAT average expected membership of the nodes in the
clusters, > 0, typic
Related Skills
node-connect
345.9kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
106.4kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
345.9kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
345.9kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
