SkillAgentSearch skills...

Unidecode

Transliteration from Unicode to US-ASCII and ISO 8859-2.

Install / Use

/learn @jirutka/Unidecode
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

Unidecode

Build Status Coverage Status Maven Central

Unidecode is a Java port of Perl library Text::Unidecode that solves transliteration of an Unicode text to US-ASCII. This implementation is not limited only to ASCII characters, currently supports also ISO-8859-2 (aka Latin 2) and can be easily extended to more charsets (contributions are welcome).

Please note that this is just a quick and dirty method of transliteration, it’s not a silver bullet! Read a detailed description of it’s limitations from the original Text::Unidecode by Sean M. Burke.

How to Use

Transliterate to ASCII

Unidecode unidecode = Unidecode.toAscii();

unidecode.decode("České „uvozovky“");
>>> Ceske "uvozovky"

unidecode.decode("42 ≥ 24");
>>> 42 >= 24

unidecode.decode("em-dash — is not in ASCII");
>>> em-dash -- is not in ASCII

unidecode.decode("南无阿弥陀佛");
>>> Nan Wu A Mi Tuo Fo

unidecode.decode("あみだにょらい");
>>> amidaniyorai

Transliterate to ISO-8859-2

Unidecode unidecode = Unidecode.toLatin2();

unidecode.decode("České „uvozovky“");
>>> České "uvozovky"

Initials

Unidecode unidecode = Unidecode.toAscii();

unidecode.initials("南无阿弥陀佛");
>>> NWAMTF

unidecode.initials("Κνωσός");
>>> K

Maven

Released versions are available in The Central Repository. Just add this artifact to your project:

<dependency>
    <groupId>cz.jirutka.unidecode</groupId>
    <artifactId>unidecode</artifactId>
    <version>1.0.1</version>
</dependency>

However if you want to use the last snapshot version, you have to add the Sonatype OSS repository:

<repository>
    <id>sonatype-snapshots</id>
    <name>Sonatype repository for deploying snapshots</name>
    <url>https://oss.sonatype.org/content/repositories/snapshots</url>
    <snapshots>
        <enabled>true</enabled>
    </snapshots>
</repository>

Other implementations

Credits

This project is a fork of the unidecode written by 徐晨阳 (xuender).

License

This project is licensed under Apache License 2.0.

Character transliteration tables used in this project are converted (and slightly modified) from the tables provided in the Perl library Text::Unidecode by Sean M. Burke and are distributed under the Perl license.

View on GitHub
GitHub Stars15
CategoryDevelopment
Updated1y ago
Forks2

Languages

Java

Security Score

60/100

Audited on Nov 27, 2024

No findings