DBCV

Python implementation of Density-Based Clustering Validation

Generate Convert Improve

Install / Use

/learn @christopherjenness/DBCV

About this skill

Quality Score

0/100

README

DBCV

Python implementation of Density-Based Clustering Validation

Source

Moulavi, Davoud, et al. "Density-based clustering validation." Proceedings of the 2014 SIAM International Conference on Data Mining. Society for Industrial and Applied Mathematics, 2014.

PDF

What is DBCV

How do you validate clustering assignmnets from unsupervised learning algorithms? A common method is the Silhoette Method, which provides an objective score between -1 and 1 on the quality of clustering. The silhouette value measures how well an object is classified in its own cluster instead of neighboring clusters. The silhouette (and most other popular methods) work very well on globular clusters, but can fail on non-glubular clusters such as:

non-globular

Here, we implement DBCV which can validate clustering assignments on non-globular, arbitrarily shaped clusters (such as the example above). In essence, DBCV computes two values:

The density within a cluster
The density between clusters

High density within a cluster, and low density between clusters indicates good clustering assignments.

Example

Here, I deliberately picked an example of clusters that density based clustering works well on.

from sklearn import datasets
import matplotlib.pyplot as plt
import seaborn as sns

n_samples=150
noisy_moons = datasets.make_moons(n_samples=n_samples, noise=.05)
X = noisy_moons[0]
plt.scatter(X[:,0], X[:,1])
plt.show()

moons

What happens when we try K-means clustering on these non-globular clusters?

from sklearn.cluster import KMeans

kmeans =  KMeans(n_clusters=2)
kmeans_labels = kmeans.fit_predict(X)
plt.scatter(X[:,0], X[:,1], c=kmeans_labels)
plt.show()

kmeans

...Not so great. What about HDBSCAN, a density based clustering method?

import hdbscan

hdbscanner = hdbscan.HDBSCAN()
hdbscan_labels = hdbscanner.fit_predict(X)
plt.scatter(X[:,0], X[:,1], c=hdbscan_labels)

hdbscan

That's pretty good. To assess the quality of clustering, using Density-Based Clustering Validation, we call DBCV

from scipy.spatial.distance import euclidean

kmeans_score = DBCV(X, kmeans_labels, dist_function=euclidean)
hdbscan_score = DBCV(X, hdbscan_labels, dist_function=euclidean)
print(kmeans_score, hdbscan_score)

K means returns a DBCV score of -0.71, and HDBSCAN returns a score of 0.60.

Related Skills

proje

Interactive vocabulary learning platform with smart flashcards and spaced repetition for effective language acquisition.

YC-Killer

2.7k

A library of enterprise-grade AI agents designed to democratize artificial intelligence and provide free, open-source alternatives to overvalued Y Combinator startups. If you are excited about democratizing AI access & AI agents, please star ⭐️ this repository and use the link in the readme to join our open source AI research team.

best-practices-researcher

The most comprehensive Claude Code skills registry | Web Search: https://skills-registry-web.vercel.app

groundhog

400

Groundhog's primary purpose is to teach people how Cursor and all these other coding agents work under the hood. If you understand how these coding assistants work from first principles, then you can drive these tools harder (or perhaps make your own!).

christopherjenness

View profile

View on GitHub

GitHub Stars180

CategoryEducation

Updated2mo ago

Forks43

christopherjenness/DBCV

Languages

Python

Security Score

100/100

Audited on Feb 3, 2026

No findings