Pke
Python Keyphrase Extraction module
Install / Use
/learn @boudinfl/PkeREADME
pke - python keyphrase extraction
pke is an open source python-based keyphrase extraction toolkit. It
provides an end-to-end keyphrase extraction pipeline in which each component can
be easily modified or extended to develop new models. pke also allows for
easy benchmarking of state-of-the-art keyphrase extraction models, and
ships with supervised models trained on the
SemEval-2010 dataset.
Table of Contents
Installation
To pip install pke from github:
pip install git+https://github.com/boudinfl/pke.git
pke relies on spacy (>= 3.2.3) for text processing and requires models to be installed:
# download the english model
python -m spacy download en_core_web_sm
Minimal example
pke provides a standardized API for extracting keyphrases from a document.
Start by typing the 5 lines below. For using another model, simply replace
pke.unsupervised.TopicRank with another model (list of implemented models).
import pke
# initialize keyphrase extraction model, here TopicRank
extractor = pke.unsupervised.TopicRank()
# load the content of the document, here document is expected to be a simple
# test string and preprocessing is carried out using spacy
extractor.load_document(input='text', language='en')
# keyphrase candidate selection, in the case of TopicRank: sequences of nouns
# and adjectives (i.e. `(Noun|Adj)*`)
extractor.candidate_selection()
# candidate weighting, in the case of TopicRank: using a random walk algorithm
extractor.candidate_weighting()
# N-best selection, keyphrases contains the 10 highest scored candidates as
# (keyphrase, score) tuples
keyphrases = extractor.get_n_best(n=10)
A detailed example is provided in the examples/ directory.
Getting started
To get your hands dirty with pke, we invite you to try our tutorials out.
| Name | Link |
| ---------------------------------------------- | ---------- |
| Getting started with pke and keyphrase extraction | |
| Model parameterization |
|
| Benchmarking models |
|
Implemented models
pke currently implements the following keyphrase extraction models:
- Unsupervised models
- Statistical models
- FirstPhrases
- TfIdf
- KPMiner (El-Beltagy and Rafea, 2010)
- YAKE (Campos et al., 2020)
- Graph-based models
- TextRank (Mihalcea and Tarau, 2004)
- SingleRank (Wan and Xiao, 2008)
- TopicRank (Bougouin et al., 2013)
- TopicalPageRank (Sterckx et al., 2015)
- PositionRank (Florescu and Caragea, 2017)
- MultipartiteRank (Boudin, 2018)
- Statistical models
- Supervised models
- Feature-based models
Model performances
For comparison purposes, overall results of implemented models on commonly-used benchmark datasets are available in results.
Code for reproducing these experiments are in the benchmarking notebook
(also available on ).
Citing pke
If you use pke, please cite the following paper:
@InProceedings{boudin:2016:COLINGDEMO,
author = {Boudin, Florian},
title = {pke: an open source python-based keyphrase extraction toolkit},
booktitle = {Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: System Demonstrations},
month = {December},
year = {2016},
address = {Osaka, Japan},
pages = {69--73},
url = {http://aclweb.org/anthology/C16-2015}
}
Related Skills
node-connect
339.5kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
claude-opus-4-5-migration
83.9kMigrate prompts and code from Claude Sonnet 4.0, Sonnet 4.5, or Opus 4.1 to Opus 4.5
frontend-design
83.9kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
model-usage
339.5kUse CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.
