DPA
The DPA package is the scikit-learn compatible implementation of the Density Peaks Advanced clustering algorithm. The algorithm provides robust and visual information about the clusters, their statistical reliability and their hierarchical organization.
Install / Use
/learn @mariaderrico/DPAREADME
Density Peaks Advanced clustering
Status of the scikit-learn_ compatibility test:
.. image:: https://github.com/mariaderrico/DPA/actions/workflows/runpytest.yml/badge.svg?branch=master :alt: scikit-learn compatibility test status on GitHub Actions :target: https://github.com/mariaderrico/DPA/actions/workflows/runpytest.yml
The DPA package implements the Density Peaks Advanced (DPA) clustering algorithm as introduced in the paper "Automatic topography of high-dimensional data sets by non-parametric Density Peak clustering", published on M. d'Errico, E. Facco, A. Laio, A. Rodriguez, Information Sciences, Volume 560, June 2021, 476-492_ (also available on arXiv_).
The package offers the following features:
- Intrinsic dimensionality estimation by means of the
TWO-NNalgorithm, published in theEstimating the intrinsic dimension of datasets by a minimal neighborhood information_ paper. - Adaptive k-NN Density estimation by means of the
PAkalgorithm, published in theComputing the free energy without collective variables_ paper. - Advanced version of the
DPclustering algorithm, published in theClustering by fast search and find of density peaks_ paper, which includes an automatic search of cluster centers and assessment of statistical significance of the clusters
.. contents::
Top-level directory layout
::
cd DPA
ls -l
::
.
|-- DP/ # Auxiliary package with the DP clustering implementation.
|-- docs/ # Documentation files.
|-- Examples/ # Auxiliary scripts for the examples generations.
|-- DPA_analysis.ipynb # Use-case example for DPA.
|-- DPA_comparison-all.ipynb # Performance comparison with other clustering methods.
|-- README.rst
|-- compile.sh
|-- setup.py
|-- src/ # Source files for DPA, PAk and twoNN algorithms.
Source files
The source Python codes are stored inside the src folder.
::
.
|-- ...
|-- src/
| |-- Pipeline/
| |-- __init__.py
| |-- DPA.py # Python module implementing the DPA
| | # clustering algorithm.
| |
| |-- _DPA.pyx # Cython extension of the DPA module.
| |
| |-- PAk.py # Python module implementing the PAk
| | # density estimator.
| |
| |-- _PAk.pyx # Cython extension of the PAk module.
| |
| |-- twoNN.py # Python module implementing the TWO-NN
| # algorithm for the ID calculation.
|
|-- ...
Documentation files
Full documentation about the Python codes developed and the how-to instructions is created in the docs folder using Sphinx.
Complete documentation for DPA is available on the Read The Docs <https://dpaclustering.readthedocs.org>_ website.
Jupyter notebooks
Examples of how-to run the DPA, PAk and twoNN modules are provided as Jupyter notebook in DPA_analysis.ipynb. Additional useful use-cases are available in DPA_comparison-all.ipynb, which include a performance comparison with the following clustering methods: Bayesian Gaussian Mixture, HDBSCAN, Spectral Clustering and Density Peaks.
Both jupyter notebooks are also available as Python script (saved using jupytext_) in the jupytext folder.
::
.
|-- ...
|-- DPA_analysis.ipynb # Use-case example for DPA.
|-- DPA_comparison-all.ipynb # Performance comparison with
| # other clustering methods.
|
|-- ...
|-- jupytext/
| |-- DPA_analysis.py # DPA_analysis.ipynb saved as
| | # Python script.
| |-- DPA_comparison-all.py # DPA_comparison-all.ipynb
| # saved as Python script.
Getting started
The source code of DPA is on github DPA repository_.
You need the git command in order to be able to clone it, and we
suggest you to use Python virtual environment in order to create a
controlled environment in which you can install DPA as
normal user avoiding conflicts with system files or Python libraries.
The following section documents the steps required to install DPA on a Linux or Windows/Mac computer.
Debian/Ubuntu ^^^^^^^^^^^^^
Run the following commands to create and activate a Python virtual environment with python virtualenv::
apt-get install git python-dev virtualenv*
virtualenv -p python3 venvdpa
. venvdpa/bin/activate
Windows ^^^^^^^
A possible setup makes use of Anaconda_.
It has preinstalled and configured packages for data analysis and it is available on all major platforms. It uses conda as package manager, in addition to the standard pip.
A versioning control can be installed by downloading git_.
Run the following commands to activate the conda virtual environment::
conda create -n venvdpa
conda activate venvdpa
to list the available environments you can type conda info --envs, and to deactivate an active environment use source deactivate.
Installation
Install required dependencies ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The DPA package depends on easycython, that can be installed using conda or pip.
Note that it is possible to check which packages are installed with the pip freeze command.
Installing released code from GitHub ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Install the latest version from the GitHub repository via::
pip install git+https://github.com/mariaderrico/DPA
Installing development code from GitHub ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Run the following commands to download the DPA source code::
git clone https://github.com/mariaderrico/DPA.git
Install DPA with the following commands::
cd DPA
. compile.sh
Citing
If you have used this codebase in a scientific publication and wish to cite the algorithm, please cite our paper in Information Sciences.
`M. d'Errico, E. Facco, A. Laio, A. Rodriguez, Information Sciences, Volume 560, June 2021, 476-492`_
.. code:: bibtex
@article{DERRICO2021476,
title = {Automatic topography of high-dimensional data sets by non-parametric density peak clustering},
journal = {Information Sciences},
volume = {560},
pages = {476-492},
year = {2021},
issn = {0020-0255},
doi = {https://doi.org/10.1016/j.ins.2021.01.010},
url = {https://www.sciencedirect.com/science/article/pii/S0020025521000116},
author = {Maria d’Errico and Elena Facco and Alessandro Laio and Alex Rodriguez},
}
.. References
.. _scikit-learn: https://scikit-learn.org/stable/
.. _M. d'Errico, E. Facco, A. Laio, A. Rodriguez, Information Sciences, Volume 560, June 2021, 476-492: https://www.sciencedirect.com/science/article/pii/S0020025521000116?dgcid=author
.. _arXiv: https://arxiv.org/abs/1802.10549v2
.. _Computing the free energy without collective variables: https://pubs.acs.org/doi/full/10.1021/acs.jctc.7b00916
.. _Estimating the intrinsic dimension of datasets by a minimal neighborhood information: https://export.arxiv.org/pdf/1803.06992
.. _Clustering by fast search and find of density peaks: http://science.sciencemag.org/content/344/6191/1492.full.pdf
.. _github DPA repository: https://github.com/mariaderrico/DPA.git
.. _Anaconda: https://www.anaconda.com/download/#windows
.. _git: https://git-scm.com
.. _jupytext: https://pypi.org/project/jupytext/
Related Skills
node-connect
349.7kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
claude-opus-4-5-migration
109.7kMigrate prompts and code from Claude Sonnet 4.0, Sonnet 4.5, or Opus 4.1 to Opus 4.5
frontend-design
109.7kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
model-usage
349.7kUse CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.
