ECRTM
Code for Effective Neural Topic Modeling with Embedding Clustering Regularization (ICML2023)
Install / Use
/learn @bobxwu/ECRTMREADME
Code for Effective Neural Topic Modeling with Embedding Clustering Regularization (ICML2023)
Check our latest topic modeling toolkit TopMost !
<div align="center"> <img src="./img/annotation.png" width="70%" align="center"> </div>Usage
1. Prepare environment
torch==1.7.1
scipy=1.7.3
scikit-learn==0.23.2
gensim==4.0.1
pyyaml==6.0
Prepare coherence evaluation:
-
Install java.
sudo apt install openjdk-11-jdk -
Download $C_V$ java jar to
./ECRTM/palmetto. It is developed by palmetto. -
Download and extract preprocessed Wikipedia articles to
./ECRTM/palmetto/wikipediaas the reference corpus.
2. Train and evaluate the model
We provide a shell script ./ECRTM/scripts/run.sh to train and evaluate our model.
Change to directory ./ECRTM, and run commands as
./scripts/run.sh ECRTM 20NG 50
./scripts/run.sh ECRTM IMDB 50
./scripts/run.sh ECRTM YahooAnswer 50
./scripts/run.sh ECRTM AGNews 50
Preprocess datasets (Optional)
Datasets in ./data have been preprocessed before.
Here we provide a shell script to show how we preprocess these datasets:
./scripts/preprocess.sh
This can be used to preprocess other datasets.
Citation
If you want to use our code, please cite as
@inproceedings{wu2023effective,
title={Effective neural topic modeling with embedding clustering regularization},
author={Wu, Xiaobao and Dong, Xinshuai and Nguyen, Thong and Luu, Anh Tuan},
booktitle={International Conference on Machine Learning},
year={2023},
organization={PMLR}
}
Related Skills
node-connect
344.4kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
99.2kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
344.4kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
344.4kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
