SkillAgentSearch skills...

Buckwalter

A small python script that transliterates Arabic text using the Buckwalter Transliteration Scheme. It allows for multiple decisions to be made around whether or not to include all types of diacritics and characters or ignore them. Useful for NLP experiments where you may want to normalize text.

Install / Use

/learn @KentonMurray/Buckwalter
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

Buckwalter

A small python script that transliterates Arabic text using the Buckwalter Transliteration Scheme. It allows for multiple decisions to be made around whether or not to include all types of diacritics and characters or ignore them. Useful for NLP experiments where you may want to normalize text.

usage: buckwalter.py [-h] -corpus CORPUS [-hamza [HAMZA]] [-madda [MADDA]] [-t [T]] [-harakat [HARAKAT]] [-tatweel [TATWEEL]] [-toUTF [TOUTF]]

Converts characters in Arabic free text to integers

optional arguments: -h, --help show this help message and exit -corpus CORPUS Path to the Arabic Corpus -hamza [HAMZA] Include Hamzas as a letter -madda [MADDA] Include Alefs with Madda on top as a separate letter (otherwise just Alef) -t [T] Include Tar Marbuta as a letter -harakat [HARAKAT] Include diacritics as separate letters (otherwise stripped) -tatweel [TATWEEL] Include tatweel as an underscore -toUTF [TOUTF] Take ASCII to Abjad

Note that converting to ASCII to UTF-8 will break if your system settings don't support printing UTF-8 to terminal

Related Skills

View on GitHub
GitHub Stars26
CategoryDevelopment
Updated1y ago
Forks10

Languages

Python

Security Score

60/100

Audited on Dec 16, 2024

No findings