DecTM
Code for Discovering Topics in Long-tailed Corpora with Causal Intervention (ACL findings2021)
Install / Use
/learn @bobxwu/DecTMREADME
Check our latest topic modeling toolkit TopMost !
Code for Discovering Topics in Long-tailed Corpora with Causal Intervention
Usage
0. Prepare environment
Requirements:
python==3.6
tensorflow-gpu==1.13.1
scipy==1.5.2
scikit-learn==0.23.2
1. Prepare data
Download preprocessed datasets from Google Drive and extract files to the path ./data.
2. Run the model
python main.py --data_dir ./data/{dataset} --output_dir ./output
3. Evaluation
topic coherence: coherence score.
topic diversity:
python utils/TU.py --data_path {path of topic word file}
Citation
If you are interested in our work, please cite as
@inproceedings{wu2021discovering,
title = "Discovering Topics in Long-tailed Corpora with Causal Intervention",
author = "Wu, Xiaobao and
Li, Chunping and
Miao, Yishu",
booktitle = "Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021",
month = aug,
year = "2021",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2021.findings-acl.15",
doi = "10.18653/v1/2021.findings-acl.15",
pages = "175--185",
}
Other related works
NLPCC2020 Learning Multilingual Topics with Neural Variational Inference
Related Skills
node-connect
345.4kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
104.6kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
345.4kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
345.4kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
