SkillAgentSearch skills...

Wiktra

Wiktra - Python tool of Wiktionary Transliteration modules for 514 languages and its 102 different scripts (orthographies)

Install / Use

/learn @kbatsuren/Wiktra

README

Wiktra: High-Quality Transliteration Powered by Wiktionary

Wiktra is a versatile Unicode transliteration tool that brings the linguistic precision of Wiktionary's community-curated transliteration modules to your command line and Python projects. It allows you to convert text from one writing system (script) to another with a high degree of accuracy.

Project Locations:

What is Wiktra?

At its core, Wiktra transliterates text. This means it converts characters or words from one script (e.g., Cyrillic, Arabic, Devanagari) into another (e.g., Latin script). Unlike simple character-by-character replacement, Wiktra utilizes sophisticated rule-based transliteration modules written in Lua, developed and maintained by linguists and contributors on Wiktionary. These modules understand the nuances of how languages are written, leading to more accurate and contextually appropriate results.

Wiktra provides:

  • A command-line interface (CLI) tool: wiktrapy
  • A Python 3 module: wiktra

Wiktra 1.0 was originally developed by Khuyagbaatar Batsuren. Wiktra 2 was significantly rewritten by Adam Twardoch.

Who is Wiktra For?

Wiktra is designed for a diverse range of users:

  • Linguists and Language Researchers: Access accurate transliterations for a vast number of languages and scripts, aiding in comparative studies, data processing, and phonetic analysis.
  • Software Developers: Integrate transliteration capabilities into applications dealing with multilingual text, internationalization (i18n), or natural language processing (NLP).
  • Archivists and Librarians: Standardize text from various scripts for cataloging and digital preservation.
  • Students and Language Learners: Understand pronunciation and script conversions for different languages.
  • Anyone working with multilingual data: Convert text to a common script for easier processing, analysis, or display.

Why is Wiktra Useful?

  • High-Quality Transliterations: Leverages the extensive, collaboratively maintained Lua-based transliteration modules from Wiktionary, ensuring a high standard of accuracy.
  • Broad Language and Script Support: Wiktra 2 supports a vast number of languages and scripts (e.g., 514 languages in 102 scripts in the new API as per original README), covering a significant portion of Wiktionary's transliteration capabilities. Use wiktrapy --stats for a current list.
  • Flexibility: Usable as both a standalone CLI tool for quick conversions and as a Python library for integration into larger projects.
  • Offline Capability: Once Wiktionary modules are packaged or updated locally, transliteration can be performed offline.
  • Open Source: Licensed under GPLv2, allowing for community contributions and transparency.

Installation

Wiktra requires Python 3.9+ and Lua (specifically LuaJIT is recommended for performance with the lupa bridge).

General Installation (using pip):

The primary way to install Wiktra is via pip:

python3 -m pip install wiktra

This will attempt to install Wiktra and its Python dependencies, including lupa, which bridges Python and Lua. The lupa installation might require Lua development headers to be present on your system.

macOS:

For macOS, a convenience script install-mac.sh (available in the source repository) can help install prerequisites like Lua via Homebrew:

  1. Download or clone the Wiktra repository.
  2. Navigate to the repository's root directory in your terminal.
  3. Run the script:
    ./install-mac.sh
    
  4. Then, install Wiktra using pip (if the script doesn't do it already, or to ensure you have the latest version from PyPI):
    # If installing from a local clone after running the script:
    python3 -m pip install --upgrade .
    # Or to get the latest from PyPI:
    python3 -m pip install --upgrade wiktra
    

Linux (Debian/Ubuntu Example):

You'll need to install Python 3, pip, and Lua development files.

sudo apt update
sudo apt install python3 python3-pip liblua5.1-0-dev luajit
# For lupa, LuaJIT (libluajit-5.1-dev) is often preferred over standard Lua dev packages.
# Depending on your distribution and lupa version, you might need different Lua versions like lua5.3-dev etc.
python3 -m pip install wiktra

Windows:

Installation on Windows can be more complex due to lupa compilation.

  1. Install Python 3.9+ (e.g., from python.org). Make sure to add Python to your PATH.
  2. Installing lupa typically requires a C compiler (like Microsoft C++ Build Tools, often installed with Visual Studio) and Lua (e.g., by compiling Lua from source, or using a package manager like Scoop or Chocolatey to install Lua/LuaJIT).
  3. It's often easier if pre-compiled lupa wheels are available for your Python version and architecture on PyPI. If not, manual setup of the build environment is necessary.
  4. Once lupa can be installed (i.e., its prerequisites are met), Wiktra can be installed via pip:
    pip install wiktra
    

Note: The original README mentioned that version 2 had not been working well on Ubuntu and Windows 10 at one point. While efforts are made to ensure cross-platform compatibility, installing lupa correctly is often the main hurdle. Refer to the lupa documentation for specific guidance on its installation.

Troubleshooting Installation:

  • LuaError: module 'wikt.mw' not found or similar Lua errors: This typically means the Lua runtime cannot find the Wiktionary modules. Wiktra attempts to set the LUA_PATH environment variable correctly during runtime. If issues persist, it might indicate a problem with how lupa is locating Lua files or an incomplete installation.
  • lupa installation issues: These are common. Ensure you have a C compiler and the correct Lua (or LuaJIT) development libraries (headers) installed. Consult lupa's documentation and open issues for platform-specific advice. Using virtual environments (e.g., venv) is highly recommended.

Basic Usage

Wiktra offers two main ways to perform transliterations:

1. Command-Line Interface (wiktrapy)

The wiktrapy tool is perfect for quick transliterations or use in shell scripts.

Basic syntax:

wiktrapy [options] -t "Your text here"
# or pipe text into it
echo "Your text here" | wiktrapy [options]

Examples:

  • Automatic language/script detection (transliterates to Latin by default):

    wiktrapy -t "Привет"
    # Expected Output: Privet
    
    echo "नमस्ते" | wiktrapy
    # Expected Output: namaste
    
  • Specifying input language and script (for explicit transliteration):

    wiktrapy -t "Привет" -l ru -s Cyrl
    # Expected Output: Privet
    

    Here, -l ru specifies Russian and -s Cyrl specifies Cyrillic script.

  • Specifying output script:

    # This example assumes a module exists for English (Latn) to Cyrillic (Cyrl)
    # wiktrapy -t "Hello" -l en -s Latn -o Cyrl
    

    The default output script is Latn (Latin).

  • Listing supported scripts and orthographies:

    wiktrapy --stats
    
  • Getting help for all options:

    wiktrapy -h
    

2. Python Module (wiktra)

For more programmatic control, use the wiktra Python module. The recommended way is to use the Transliterator class.

Example (New API - Recommended):

from wiktra.Wiktra import Transliterator

# Create a Transliterator instance
# This is best done once if you're doing multiple transliterations
tr = Transliterator()

# Transliterate text with automatic language/script detection
# (will try to guess input script and use 'und' - undefined language for that script)
text_cyrillic = "Привет мир"
latin_text = tr.tr(text_cyrillic)
print(f"'{text_cyrillic}' -> '{latin_text}'")
# Expected Output: 'Привет мир' -> 'Privet mir'

text_devanagari = "नमस्ते दुनिया"
latin_text_dev = tr.tr(text_devanagari)
print(f"'{text_devanagari}' -> '{latin_text_dev}'")
# Expected Output: 'नमस्ते दुनिया' -> 'namaste duniyaa'

# Explicitly specify language, input script, and output script
text_russian = "Русский текст"
# lang='ru' (Russian), sc='Cyrl' (Cyrillic), to_sc='Latn' (Latin)
transliterated_explicit = tr.tr(text_russian, lang='ru', sc='Cyrl', to_sc='Latn', explicit=True)
print(f"'{text_russian}' (explicit) -> '{transliterated_explicit}'")
# Expected Output: 'Русский текст' (explicit) -> 'Russkij tekst'

# Using the class instance is more efficient for multiple transliterations
# as the Lua runtime and modules are initialized only once.
  • If explicit=True, you must provide lang (input language code, e.g., ISO 639) and sc (input script code, e.g., ISO 15924). to_sc (output script code) defaults to Latn if not specified.
  • If explicit=False (the default), Wiktra attempts to guess the input script if sc is not provided. It then typically assumes an "undefined" (und) language for that script, unless lang is also provided.

Legacy Function (translite):

A legacy translite function is also available, primarily for compatibility with older versions of Wiktra or specific use cases that relied on its distinct language code mapping.

from wiktra.Wiktra import translite as tr_legacy

# Example for Mongolian (Cyrillic) using its legacy code 'mon'
mongolian_text = "монгол бичлэг"
transliterated_mongolian = tr_legacy(mongolian_text, 'mon')
print(f"'{mongolian_text}' (legacy) -> '{transliterated_mongolian}'")
# Expected Output: 'монгол бичлэг' (legacy) -> 'mongol bichleg'

It is generally recommended to use the new `Transliter

View on GitHub
GitHub Stars36
CategoryDevelopment
Updated14d ago
Forks6

Languages

Lua

Security Score

95/100

Audited on Mar 20, 2026

No findings