Fgsdmm
A state of the art Dirichlet-multinomial mixture model for short text topic modelling/clustering.
Install / Use
/learn @ckingdev/FgsdmmREADME
FGSDMM
Fast Gibbs Sampling for Dirichlet Multinomial Mixtures
This is an implementation of the collapsed Gibbs sampling algorithm introduced in A Dirichlet Multinomial Mixture Model-based Approach for Short Text Clustering (Yin and Wang, 2014) using the optimizations discussed in A Text Clustering Algorithm Using an Online Clustering Scheme for Initialization (Yin and Wang, 2016).
This is a hierarchical Bayesian model suitable for topic modelling over short texts. The number of topics is bounded above by a hyperparameter, however, an optimization allows for the complexity (time and space) to be approximately linear in the number of non-empty clusters. Results of the above papers show that it is effective at finding the "true" number of clusters in a corpus as long as the maximum number of clusters is chosen to be greater than the true number of clusters.
Warning
This is a work in progress and there will be breaking changes to the API.
The algorithm is correct currently and uses the optimization that allows for tracking only the nonempty clusters, so it is efficient in that regard. It does not yet use the "FGSDMM+" optimization that uses the DMM to sample the initial cluster assignments in an informed way.
Related Skills
node-connect
343.3kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
92.1kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
343.3kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
343.3kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
