Kmpp
k-means clustering algorithm with k-means++ initialization.
Install / Use
/learn @cmtt/KmppREADME
kmpp
When dealing with lots of data points, clustering algorithms may be used to group them. The k-means algorithm partitions n data points into k clusters and finds the centroids of these clusters incrementally.
The algorithm assigns data points to the closest cluster, and the centroids of each cluster are re-calculated. These steps are repeated until the centroids do not changing anymore.
The basic k-means algorithm is initialized with k centroids at random positions. This implementation addresses some disadvantages of the arbitrary initialization method with the k-means++ algorithm (see "Further reading" at the end).
Installation
Installing via npm
Install kmpp as Node.js module via NPM:
$ npm install kmpp
Example
var kmpp = require('kmpp');
kmpp([
[x1, y1, ...],
[x2, y2, ...],
[x3, y3, ...],
...
], {
k: 4
});
// =>
// { converged: true,
// centroids: [[xm1, ym1, ...], [xm2, ym2, ...], [xm3, ym3, ...]],
// counts: [ 7, 6, 7 ],
// assignments: [ 2, 2, 2, 2, 2, 2, 2, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1 ]
// }
API
kmpp(points[, opts)
Exectes the k-means++ algorithm on points.
Arguments:
points(Array): An array-of-arrays containing the points in format[[x1, y1, ...], [x2, y2, ...], [x3, y3, ...], ...]opts: object containing configuration parameters. Parameters aredistance(function): Optional function that takes two points and returns the distance between them.initialize(Boolean): Perform initialization. If false, uses the initial state provided incentroidsandassignments. Otherwise discards any initial state and performs initialization.k(Number): number of centroids. If not provided,sqrt(n / 2)is used, wherenis the number of points.kmpp(Boolean, default:true): If true, uses k-means++ initialization. Otherwise uses naive random assignment.maxIterations(Number, default:100): Maximum allowed number of iterations.norm(Number, default:2): L-norm used for distance computation.1is Manhattan norm,2is Euclidean norm. Ignored ifdistancefunction is provided.centroids(Array): An array of centroids. Ifinitializeis false, used as initialization for the algorithm, otherwise overwritten in-place if of the correct size.assignments(Array): An array of assignments. Used for initialization, otherwise overwritten.counts(Array): An output array used to avoid extra allocation. Values are discarded and overwritten.
Returns an object containing information about the centroids and point assignments. Values are:
converged:trueif the algorithm converged successfullycentroids: a list of centroidscounts: the number of points assigned to each respective centroidassignments: a list of integer assignments of each point to the respective centroiditerations: number of iterations used
Credits
-
Jared Harkins improved the performance by reducing the amount of function calls, reverting to Manhattan distance for measurements and improved the random initialization by choosing from points
-
Ricky Reusser refactored API
Further reading
- Wikipedia: k-means clustering
- Wikipedia: Determining the number of clusters in a data set
- k-means++: The advantages of careful seeding, Arthur Vassilvitskii
- k-means++: The advantages of careful seeding, Presentation by Arthur Vassilvitskii (Presentation)
License
© 2017-2019. MIT License.
Related Skills
node-connect
351.8kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
110.9kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
351.8kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
351.8kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
