Unine
Unine light stemmer for French, German, Italian, Spanish, Portuguese, Finnish, Swedish
Install / Use
/learn @pommedeterresautee/UnineREADME
Implementation of "light" stemmers for French, German, Italian, Spanish, Portuguese, Finnish, Swedish.
They are based on the same work as the "light" stemmers found in SolR or ElasticSearch.
A "light" stemmer consists in removing inflections only for noun and adjectives.
Indexing verbs for these languages is not of primary importance compared to nouns and adjectives.
The procedures used in this stemmer are described below:
- the stemming procedure for French is described in (Savoy, 1999).
- in Italian, the main inflectional rule is to modify the final character (e.g., «-o», «-a» or «-e») into another (e.g., «-i», «-e»). As a second rule, Italian morphology may also alter the final two letters (e.g., «-io» in «-o», «-co» in «-chi», «-ga» in «-ghe»).
- in German, a few rules may be applied to obtain the plural form of words (e.g., "Frau" into "Frauen" (woman), "Bild" into "Bilder" (picture), "Sohn" into "Söhne" (son), "Apfel" into "Äpfel" (apple)), but the suggested algorithms do not account for person and tense variations, or for the morphological variations used by verbs.
Online tests are available on this website.
Installation
You can install the released version of unine from CRAN with:
install.packages("unine")
... or the last version from Github
devtools::install_github("pommedeterresautee/unine")
Example
Below some examples for French and a comparaison with Porter French stemmer.
french_stemmer(words = c("complète", "caissière"))
# [1] "complet" "caisier"
# Not that below double letters are deduplicated: caissière -> caisier
french_stemmer(words = c("tester", "testament", "chevaux", "aromatique", "personnel", "folle"))
# [1] "test" "testament" "cheval" "aromat" "personel" "fou"
# Not that below double letters are deduplicated: personnel -> personel
# look at how "testament" and "tester" have been stemmed above.
# Now with Porter stemmer :
SnowballC::wordStem(c("testament", "tester"), language = "french")
# [1] "test" "test"
References
Please cite [1] if using this R package.
[1] J. Savoy, A stemming procedure and stopword list for general French corpora
@article{savoy1999stemming,
title={A stemming procedure and stopword list for general French corpora},
author={Savoy, Jacques},
journal={Journal of the American Society for Information Science 50(10), 944-952.},
year={2009}
}
Related Skills
node-connect
340.5kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
84.2kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
340.5kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
commit-push-pr
84.2kCommit, push, and open a PR

