SkillAgentSearch skills...

Semco

Calculate latent topic semantic coherence (Mimno et al, EMNLP 2011)

Install / Use

/learn @davidandrzej/Semco
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

semco - calculate latent topic semantic coherence scores as described in Mimno et al (EMNLP 2011) [1]

David Andrzejewski (andrzejewski1@llnl.gov) Center for Applied Scientific Computing Lawrence Livermore National Laboratory

OVERVIEW

While the Latent Dirichlet Allocation (LDA) [2] model can be used to automatically learn latent topics from text, the interpretability or semantic coherence of these topics can vary considerably. Mimno et al [1] propose a topic semantic coherence score computed from in-corpus word co-occurrence statistics, and show strong agreement with human evaluations. Given topics and text documents, this code calculates semantic coherence scores for each topic.

EXAMPLE USAGE

The semantic-coherence takes as input a sequence of "Top N" string sequences (the topics) and a sequence of document string sequences (the corpus).

The return value is a sequence of {:topic :words :semco} maps, where :topic is the numerical index (ie, where in the original input sequence was this topic?), :words are the Top N topic words, and :semco is the semantic coherence score calculated over the corpus.

(ns my-example (:use semco))

(let [topics [["a" "b" "c"] ["x" "y" "z"]] docs [["a" "a" "a"] ["b" "c" "x" "z"] ["x" "x" "y" "z" "a"]]] (println "\n") (doseq [result (semantic-coherence topics docs)] (println (format "Topic %d = %s" (:topic result) (:words result))) (println (format "\tsemantic coherence = %f" (:semco result)))) (println "\n"))

REFERENCES

[1] Optimizing Semantic Coherence in Topic Models. David Mimno, Hanna Wallach, Edmund Talley, Miriam Leenders, Andrew McCallum. EMNLP, 2011, Edinburgh, Scotland.

[2] Latent Dirichlet Allocation. David Blei, Andrew Ng, Michael Jordan. Journal of Machine Learning Research (JMLR) 3 (Mar. 2003), 993-1022.

LICENSE

Copyright (c) 2011, Lawrence Livermore National Security, LLC. Produced at the Lawrence Livermore National Laboratory. Written by David Andrzejewski, andrzejewski1@llnl.gov OCEC-10-073 All rights reserved.

This file is part of the C-Cat package and is covered under the terms and conditions therein. See https://github.com/fozziethebeat/C-Cat for details.

This code is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License version 2 as published by the Free Software Foundation and distributed hereunder to you.

THIS SOFTWARE IS PROVIDED "AS IS" AND NO REPRESENTATIONS OR WARRANTIES, EXPRESS OR IMPLIED ARE MADE. BY WAY OF EXAMPLE, BUT NOT LIMITATION, WE MAKE NO REPRESENTATIONS OR WARRANTIES OF MERCHANT- ABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE OR THAT THE USE OF THE LICENSED SOFTWARE OR DOCUMENTATION WILL NOT INFRINGE ANY THIRD PARTY PATENTS, COPYRIGHTS, TRADEMARKS OR OTHER RIGHTS.

You should have received a copy of the GNU General Public License along with this program. If not, see http://www.gnu.org/licenses/.

View on GitHub
GitHub Stars9
CategoryDevelopment
Updated10mo ago
Forks0

Languages

Clojure

Security Score

77/100

Audited on May 23, 2025

No findings