GSDMM
My own implementation of Gibbs Sampling for DMM(Dirichlet Multinomial Mixture)
Install / Use
/learn @FengBli/GSDMMREADME
Motivation
When I was working on the third homework of data mining course: clustering the short texts, I found this paper in Reference section which turned out be to the one recommended by Mr. Zhang in class. So I tried to implement the GSDMM algorithm proposed myself, of course, with the help of online resources.
NOTICE
This implementation is still on going.
Data Format
vacabulary.json, with one word and its corresponding id each line.train_tokens.json, with one doc-id and its token list each line.train_topics.json, using for validation.
Reference
- Paper
- Yin, J. and Wang, J., 2014, August. A dirichlet multinomial mixture model-based approach for short text clustering. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 233-242).ACM.
- Nguyen, D. Q., Billingsley, R., Du, L., & Johnson, M. (2015). Improving topic models with latent feature word representations. , 3, 299-313.
- Code
- datquocnguyen/jLDADMM: java version
- atefm/pDMM: python version
Related Skills
node-connect
343.3kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
92.1kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
343.3kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
343.3kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
