Pyigt
Handling Interlinear Glossed Text in python
Install / Use
/learn @cldf/PyigtREADME
pyigt: Handling interlinear glossed text with Python
This library provides easy access to Interlinear Glossed Text (IGT) according to the Leipzig Glossing Rules, stored as CLDF examples.
Installation
Installing pyigt via pip
pip install pyigt
will install the Python package along with a command line interface igt.
Note: The methods Corpus.get_wordlist and Corpus.get_profile, to extract a wordlist and an orthography profile
from a corpus, require the lingpy package. To make sure it is installed, install pyigt as
pip install pyigt[lingpy]
CLI
$ igt -h
usage: igt [-h] [--log-level LOG_LEVEL] COMMAND ...
optional arguments:
-h, --help show this help message and exit
--log-level LOG_LEVEL
log level [ERROR|WARN|INFO|DEBUG] (default: 20)
available commands:
Run "COMAMND -h" to get help for a specific command.
COMMAND
ls List IGTs in a CLDF dataset
stats Describe the IGTs in a CLDF dataset
The igt ls command allows inspecting IGTs from the commandline, formatted using the
four standard lines described in the Leipzig Glossing Rules, where analyzed text and
glosses are aligned, e.g.
$ igt ls tests/fixtures/examples.csv
Example 1:
zəple: ȵike: peji qeʴlotʂuʁɑ,
zəp-le: ȵi-ke: pe-ji qeʴlotʂu-ʁɑ,
earth-DEF:CL WH-INDEF:CL become-CSM in.the.past-LOC
...
Example 5:
zuɑməɸu oʐgutɑ ipiχuɑȵi,
zuɑmə-ɸu o-ʐgu-tɑ i-pi-χuɑ-ȵi,
cypress-tree one-CL-LOC DIR-hide-because-ADV
IGT corpus at tests/fixtures/examples.csv
igt ls can be chained with other commandline tools such as commands from the
csvkit package for filtering:
$ csvgrep -c Primary_Text -m"ȵi" tests/fixtures/examples.csv | csvgrep -c Gloss -m"ADV" | igt ls -
Example 5:
zuɑməɸu oʐgutɑ ipiχuɑȵi,
zuɑmə-ɸu o-ʐgu-tɑ i-pi-χuɑ-ȵi,
cypress-tree one-CL-LOC DIR-hide-because-ADV
Python API
The Python API is documented in detail at readthedocs. Below is a quick overview.
You can read all IGT examples provided with a CLDF dataset
>>> from pyigt import Corpus
>>> corpus = Corpus.from_path('tests/fixtures/cldf-metadata.json')
>>> len(corpus)
5
>>> for igt in corpus:
... print(igt)
... break
...
zəple: ȵike: peji qeʴlotʂuʁɑ,
zəp-le: ȵi-ke: pe-ji qeʴlotʂu-ʁɑ,
earth-DEF:CL WH-INDEF:CL become-CSM in.the.past-LOC
or instantiate individual IGT examples, e.g. to check for validity:
>>> from pyigt import IGT
>>> ex = IGT(phrase="palasi=lu", gloss="priest-and")
>>> ex.check(strict=True, verbose=True)
palasi=lu
priest-and
...
ValueError: Rule 2 violated: Number of morphemes does not match number of morpheme glosses!
or to expand known gloss abbreviations:
>>> ex = IGT(phrase="Gila abur-u-n ferma hamišaluǧ güǧüna amuq’-da-č.",
... gloss="now they-OBL-GEN farm forever behind stay-FUT-NEG",
... translation="Now their farm will not stay behind forever.")
>>> ex.pprint()
Gila aburun ferma hamišaluǧ güǧüna amuq’dač.
Gila abur-u-n ferma hamišaluǧ güǧüna amuq’-da-č.
now they-OBL-GEN farm forever behind stay-FUT-NEG
‘Now their farm will not stay behind forever.’
OBL = oblique
GEN = genitive
FUT = future
NEG = negation, negative
And you can go deeper, parsing morphemes and glosses according to the LGR (see module pyigt.lgrmorphemes):
>>> igt = IGT(phrase="zəp-le: ȵi-ke: pe-ji qeʴlotʂu-ʁɑ,", gloss="earth-DEF:CL WH-INDEF:CL become-CSM in.the.past-LOC")
>>> igt.conformance
<LGRConformance.MORPHEME_ALIGNED: 2>
>>> igt[1, 1].gloss
<Morpheme "INDEF:CL">
>>> igt[1, 1].gloss.elements
[<GlossElement "INDEF">, <GlossElementAfterColon "CL">]
>>> igt[1, 1].morpheme
<Morpheme "ke:">
>>> print(igt[1, 1].morpheme)
ke:
See also
- interlineaR - an R package with similar functionality, but support for more input formats.
Related Skills
node-connect
347.0kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
107.8kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
347.0kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
347.0kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
