Omorfi
Open morphology for Finnish
Install / Use
/learn @flammie/OmorfiREADME
🇫🇮Omorfi–Open morphology of Finnish
This is a free/libre open source morphology of Finnish: a database, tools and APIs. Everything you need to build NLP applications processing Finnish language words and texts.
- 🇫🇮 high-quality Finnish text analysis and generation
- 🩸 bleeding edge
- ⚡ blazing fast
Documentation
I try to keep this README very condensed for github.
For more detailed information, see github pages for
omorfi.
Licence
Omorfi is licenced GNU GPLv3 (not later). The bundled HFST in the java API demo is Apache.
Citing and academic works
Citation information is available in github's cite this repository function, backed by the CITATION.cff. For further details, see omorfi articles.
Downloading and further information
Omorfi source packages can be downloaded from github:
or the most current version using git. For more information, see Release policy
Dependencies
- hfst-3.15 or greater,
- python-3.5 or greater,
- pyhfst,
- C++ compiler and libtool
- GNU autoconf-2.64, automake-1.12; compatible pkg-config implementation
Optionally:
- VISL CG 3
- hfst-ospell-0.2.0 or greater needed for spell-checking
- Java 7, or greater, for Java bindings
Installing dependencies
HFST can be installed from following instructions from giellalt (only need to do step 1) or instructions from apertium
Pyhfst can be installed from pip:
$ pip install pyhfst
If you are stuck on a platform that doesn't let you install from pip, you may
need to use venv like instructed by pip:
$ python3 -m venv venv
$ source venv/bin/activate
$ pip install pyhfst
Other tools like compilers, autotools and pkgconfig should be installed from the operating systems' software manager (apt, brew, ...)
Installation
For detailed instructions and explanations of different options, see Installation instructions on the github pages site. This readme is a quick reference.
Full installation
Requires all dependencies to be installed.
autoreconf -i
./configure
make
make install
Will install binaries and scripts for all users on typical environments
Minimal "installation"
To skip language model building and use some of the scripts locally.
autoreconf -i
./configure
src/bash/omorfi-download.bash
This will download some of the pre-compiled dictionaries into your current working directory.
Python installation
It is possible to install within python via pip or anaconda. The
dependencies that are not available in pip or anaconda will not be usable, e.g.
syntactic analysis and disambiguation using VISL CG 3.
pip install omorfi
conda install -c flammie omorfi
NB: since conda does not have new version of hfst buildable with recent pythons or something, only older versions are available on conda.
Docker
It is possible to use omorfi with a ready-made docker container, there is a
Dockerfile in src/docker/Dockerfile for that.
docker build -t "omorfi:Dockerfile" .
docker run -it "omorfi:Dockerfile" bash
Usage
Omorfi can be used from command line using following commands:
omorfi-disambiguate-text.sh: analyse and disambiguateomorfi-analyse-text.sh: analyseomorfi-spell.sh: spell-check and correctomorfi-segment.sh: morphologically segmentomorfi-conllu.bash: analyse in CONLL-U formatomorfi-freq-evals.bash: analyse coverage and statisticsomorfi-ftb3.bash: analyse in FTB-3 format (CONLL-X)omorfi-factorise.bash: analyse in Moses-SMT factorised formatomorfi-vislcg.bash: analyse in VISL CG 3 formatomorfi-analyse-tokenised.sh: analyse word per line (faster)omorfi-generate.sh: generate word-forms from omor descriptionsomorfi-download.bash: download language models from latest release
For further details please refer to:
Programming APIs
Omorfi can be used via very simple programming APIs, the design is detailed in omorfi API design
Using binary models
There are various binaries for language models that can be used with specialised tools like HFST. For further details, see our usage examples.
Troubleshooting
For full descriptions and archived problems, see: Troubleshooting in github pages
hfst-lexc: Unknown option
Update HFST.
ImportError (or other Python problems)
In order for python scripts to work you need to install them to same prefix as
python, or define PYTHONPATH, e.g. export PYTHONPATH=/usr/local/lib/python3.11/site-packages/
Processing text gets stuck / takes long
This can easily happen for legit reasons. It can be reduced by filtering overlong tokens out. Or processing texts in smaller pieces.
Make gets killed
Get more RAM or swap space.
Contributing
Omorfi code and data are free and libre open source, and community-driven, to participate, read further information in CONTRIBUTING
Contact
- Issues and problems may be filed in our github issue tracker, including support questions
- Matrix channel omorfi is particularly good for live chat for support questions, suggestions and discussions
- omorfi-devel mailing list is good for longer more involved discussions
You can always discuss in English or Finnish on any of the channels.
Code of conduct
See our code of conduct.
Donations
A lot of omorfi development has been done on spare time and by volunteers, if you want to support Flammie you can use the github's ❤️Sponsor button, or any of the services below:
<a href="https://liberapay.com/Flammie/donate"><img alt="Donate using Liberapay" src="https://liberapay.com/assets/widgets/donate.svg"></a>
<a href="https://www.patreon.com/bePatron?u=9479606" data-patreon-widget-type="become-patron-button">Become a Patron!</a>
Related Skills
node-connect
352.0kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
claude-opus-4-5-migration
111.1kMigrate prompts and code from Claude Sonnet 4.0, Sonnet 4.5, or Opus 4.1 to Opus 4.5
frontend-design
111.1kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
model-usage
352.0kUse CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.
