SkillAgentSearch skills...

ECRTM

Code for Effective Neural Topic Modeling with Embedding Clustering Regularization (ICML2023)

Install / Use

/learn @bobxwu/ECRTM
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

Code for Effective Neural Topic Modeling with Embedding Clustering Regularization (ICML2023)

Check our latest topic modeling toolkit TopMost !

ICML 2023

<div align="center"> <img src="./img/annotation.png" width="70%" align="center"> </div>

Usage

1. Prepare environment

torch==1.7.1
scipy=1.7.3
scikit-learn==0.23.2
gensim==4.0.1
pyyaml==6.0

Prepare coherence evaluation:

  1. Install java.

     sudo apt install openjdk-11-jdk
    
  2. Download $C_V$ java jar to ./ECRTM/palmetto. It is developed by palmetto.

  3. Download and extract preprocessed Wikipedia articles to ./ECRTM/palmetto/wikipedia as the reference corpus.

2. Train and evaluate the model

We provide a shell script ./ECRTM/scripts/run.sh to train and evaluate our model.

Change to directory ./ECRTM, and run commands as

./scripts/run.sh ECRTM 20NG 50
./scripts/run.sh ECRTM IMDB 50
./scripts/run.sh ECRTM YahooAnswer 50
./scripts/run.sh ECRTM AGNews 50

Preprocess datasets (Optional)

Datasets in ./data have been preprocessed before. Here we provide a shell script to show how we preprocess these datasets:

./scripts/preprocess.sh

This can be used to preprocess other datasets.

Citation

If you want to use our code, please cite as

@inproceedings{wu2023effective,
    title={Effective neural topic modeling with embedding clustering regularization},
    author={Wu, Xiaobao and Dong, Xinshuai and Nguyen, Thong and Luu, Anh Tuan},
    booktitle={International Conference on Machine Learning},
    year={2023},
    organization={PMLR}
}

Related Skills

View on GitHub
GitHub Stars46
CategoryDevelopment
Updated20d ago
Forks5

Languages

Python

Security Score

80/100

Audited on Mar 12, 2026

No findings