HornMorpho
Morphological processing for languages of the Horn of Africa
Install / Use
/learn @hltdi/HornMorphoREADME
=========================== HornMorpho, version 5.3.4
Sept. 14, 2025
Introduction
HornMorpho (HM) is a Python program that performs morphological analysis and generation for various languages of the Horn of Africa. The languages supported in Version 5.3 are Amharic (አማርኛ), Oromo (Afaan Oromoo, Oromiffa), Tigrinya (Tigrigna, ትግርኛ), and Tigre (ትግሬ, ትግራይት). Most examples within this document are Amharic; future versions will include more examples from the other languages.
If your application can benefit from explicit linguistic information about the structure and grammatical properties of words in these languages, then you may want to use HM. HM can tell you, for example, that the verb የማይደረገው is negative, that the noun አባቴን is the object of some verb, that the stem (the word without prefixes and suffixes) of the verb የምንፈልጋቸው is -ፈልግ-, that the lemma (basic form) of the verb እንደሚመኟቸው is ተመኘ, that is, that this verb has something to do with ‘longing’. HM can also tell you that the word እንደሚመኟቸው consists of five segments (morphemes): እንደም+ይ+መኝ+ኡ+ኣቸው.
HM is a rule-based program; that is, the knowledge in the program is based on explicit linguistic rules and a lexicon, a dictionary of basic word forms (stems and roots), rather than on machine learning of the knowledge from a corpus.
- For Amharic, the lexicon is extracted mainly from Amsalu Aklilu’s Amharic-English Dictionary (Addis Ababa, Kuraz, 2004). The rules come from many grammars of the language.
- For Tigrinya, the lexicon is from Thomas Leiper Kane’s Tigrinya-English Dictionary (Kensington, MD, USA, Dunwoody Press, 2000). The rules come mainly from Wolf Leslau’s Documents Tigrigna, Grammaire et Textes (Paris, Librarie C. Klincksieck, 1941) and Amanuel Sahle’s ሰዋስው ትግርኛ ብሰፊሕ (Lawrenceville, NJ, Red Sea Press, 1998).
- For Oromo, the lexicon is from two dictionaries, Gene B. Gragg’s Oromo Dictionary (African Studies Center, Michigan State University, 1982) and Tamene Bitima’s A Dictionary of Oromo Technical Terms (Oromo-English) (Rüdiger Köppe, Köln, 2000). The rules come mainly from Catherine Griefenow-Mewis’s A Grammatical Sketch of Written Oromo (Köln, Rüdiger Köppe Verlag, 2001).
- For Tigre, all of the words and rules are from the Mansa` dialect of the language. The lexicon is still quite limited, containing only several hundred noun and adjective roots and 86 verb roots. The roots are taken from Saleh Mahmud Idris’s A Comparative Study of the Tigrinya Dialects (Aachen, Shaker [Semitica et Semiohamitica Berolinensia 18], 2015) and from Shlomo Raz’s Tigre Grammar and Texts (Malibu, CA, USA, Undena Publications, 1983). The rules come from Raz.
Though HM does not make use of machine learning, it is possible to use
its output in models that do. For example, Gezmu & Nürnberger (2023) <https://dl.acm.org/doi/10.1145/3610773>__ uses HM’s
segmentation of Amharic words for neural machine translation.
HM assigns a part-of-speech (POS) to each word, but if you want a POS tagger, you should look elsewhere. A word’s POS often depends on the other words in the sentence in which it occurs, and HM analyzes words without looking at their context.
HM has a list of Amharic person and place names, but if you want named entity recognition, you should look for a program that has been trained to do this. If a name is not in HM’s list for Amharic, it will just be treated as an unknown word, and this will be true for almost all names in Tigrinya, Oromo, and Tigre.
Version 5 replaces Version 4.5 for Amharic. For other languages, see Version 4.3. Version 5 is not backward compatible with earlier versions. If you have used earlier versions of HM and would like to switch to Version 5, please contact gasser@iu.edu for help.
Installation
It is highly recommended that you install the program in a virtual environment <https://realpython.com/python-virtual-environments-a-primer/>__,
but this is not required. If you are using a virtual environment, you
will need to create the environment and activate it before running
pip install.
First download the wheel file from the dist/ folder:
HornMorpho-5.3.4-py3-none-any.whl <https://github.com/hltdi/HornMorpho/blob/master/dist/HornMorpho-5.3.4-py3-none-any.whl>__
Then, to install from the wheel file, do the following in a Python shell from the folder where the wheel file is
::
pip install HornMorpho-5.3.4-py3-none-any.whl
If this fails, it may mean that you don’t have
wheel <https://pypi.org/project/wheel/>__ installed, so try again
after installing wheel.
Then to use the program, in a Python shell, do
import hm
The first time you use HornMorpho, you will need to download the data for the languages that you will be using. Each language’s data is stored in a compressed .tgz archive. To download a language’s archive, do this
hm.download(language)
where language is 'a' for Amharic, 't' for Tigrinya, 'o'
for Oromo, or 'te' for Tigre. This will download the compressed file
from the HornMorpho Github repository and then uncompress it. If you try
to use any of the functions described below without first downloading
the data for the relevant language, you will be prompted to download the
data.
If you have problems with installation, contact gasser@iu.edu.
Quickstart
If you aren’t interested in learning more about what HM can do and just want to use it to analyze the words in a corpus of sentences, this section has the minimum that you’ll need to know.
To analyze the words in a corpus, use the function anal_corpus(),
passing the sentences as a list of strings, using the keyword data,
or as a path to a file containing the sentences, using the keyword
path.
::
(1)
c = hm.anal_corpus('a', data=["በሶ የበላው አበበ አይደለም ።", "ጫላ ጩቤዬን ጨብጧል ።"])
This returns an instance of the class Corpus, which has a
write() method that you can call to write the analyses to a file,
using the keyword path, or to standard output if you specify no
path. You can tell which word attributes you want to write with the
keyword attribs. Some possible attributes are part-of-speech
('pos'), morphological features ('um'), segmentation into
morphemes ('seg'), and lemma ('lemma').
::
(2)
c.write(attribs=['pos', 'um', 'lemma']) በሶ የበላው አበበ አይደለም ። በሶ N SG በሶ የበላው V *RELC;3;DEF;MASC;PFV;SG በላ አበበ V 3;MASC;PFV;SG አበበ አይደለም COP 3;MASC;NEG;PRS;SG ነው ። PUNCT
ጫላ ጩቤዬን ጨብጧል ። ጫላ PROPN SG ጫላ ጩቤዬን N ACC;PSS1S;SG ጩቤ ጨብጧል V 3;MASC;PRF;SG ጨበጠ ። PUNCT
Overview of the program
HM is a rule-based morphological analyzer and generator, implemented in
the form of finite-state transducers weighted with feature
structures. For the theory behind the program, see Gasser (2011) <https://www.researchgate.net/publication/228910448_HornMorpho_a_system_for_morphological_processing_of_Amharic_Oromo_and_Tigrinya>__.
Most users of HM will be interested in morphological analysis. The program also works in the opposite direction, performing morphological generation, taking as input the root and grammatical features of a word and returning the word form. Documentation of the generation functions is forthcoming.
The simplest HM function, anal, takes a word and returns an instance
of the Word class. An HM Word is a list of Python
dict\ s, each representing a separate analysis of the input
word. [1]_ You can use the usual Python ways of accessing the elements
in a list or dict. For example, here is how you would analyze
the Amharic word የቤታችን. The first argument to anal specifies the
language; 'a' is Amharic, 't' Tigrinya, 'o' Oromo, 'te'
Tigre.
::
(3)
w = hm.anal('a', "የቤታችን")
The keys in the dict for an analysis of a word represent different
pieces of information that you may be interested in. For example, you
may want the lemma of the input word. This is the basic form of the
word. For nouns in all of the languages, this is the stem of the word
without any prefixes or suffixes. Here’s how you’d get the lemma for the
above analysis of the word የቤታችን. w[0] returns the first analysis
dict in the list of analyses, and w[0]['lemma'] returns the
value associated with the keyword lemma in this dict. [2]_
::
(4)
>>> w[0]['lemma']
>>> 'ቤት'
Other dict keys are described below <#keywords>__.
You will probably not want to use HM to analyze individual words, as in
the above example. There are also functions for analyzing sentences and
corpora of sentences, anal_sentence() <#anal_sentence>__ and
anal_corpus() <#anal_corpus>__, described below. These functions call
anal() on the words in the sentences.
Morphological segmentation
Morphemes ^^^^^^^^^
A morphologically complex word consists of multiple morphemes, that is, more than one meaningful unit. One morpheme, the stem, is the part that conveys the basic meaning (the lexical meaning) of the word. The other morphemes, those that appear before the stem (as prefixes), after the stem (as suffixes) or within the stem (as infixes), modify the lexical meaning in various ways. For example, the Amharic word ለቤቶቻችን ‘for our houses’ consists of the stem ቤት and three additional morphemes, the prefix ለ- and the suffixes -ኦች and -ኣችን. [3]_
Segmentation ^^^^^^^^^^^^
A morphological segmentation of a word consists of a representation of the sequence of morphemes that make up the word. Morphological segmentation may be useful in NLP applications that make use of subword units, for example, language models. In these cases it provides
Related Skills
node-connect
341.2kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
84.5kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
341.2kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
commit-push-pr
84.5kCommit, push, and open a PR
