Trigrams
Trigram files for 500+ languages
Install / Use
/learn @wooorm/TrigramsREADME
trigrams
[![Build][badge-build-image]][badge-build-url] [![Coverage][badge-coverage-image]][badge-coverage-url] [![Downloads][badge-downloads-image]][badge-downloads-url]
Trigrams for 500+ languages.
Contents
- What is this?
- When should I use this?
- Install
- Use
- API
- Data
- Compatibility
- Contribute
- Security
- License
What is this?
This package exposes all trigrams for natural languages. Based on the most translated copyright-free document on this planet: UDHR.
When should I use this?
When you are dealing with natural language detection.
Install
This package is [ESM only][github-gist-esm]. In Node.js (version 18+), install with [npm][npmjs-install]:
npm install trigrams
In Deno with [esm.sh][esmsh]:
import {min, top} from 'https://esm.sh/trigrams@6'
In browsers with [esm.sh][esmsh]:
<script type="module">
import {min, top} from 'https://esm.sh/trigrams@6?bundle'
</script>
Use
import {min, top} from 'trigrams'
console.log((await min()).nld)
console.log((await top()).pam)
Yields:
[ // 300 top trigrams.
' ar',
'eer',
'tij',
// …
'de ',
'an ',
'en ' // Most common trigram.
]
{ // 300 top trigrams.
'isa': 6,
'upa': 6,
'i k': 6,
// …
'ang': 273,
'ing': 282,
'ng ': 572 // Most common trigram with how often it was found.
}
API
This package exports the identifiers
[min][api-min] and
[top][api-top].
It exports no [TypeScript][] types.
There is no default export.
min()
Get top trigrams.
Returns
Returns a promise resolving to arrays containing the top 300 trigrams sorted
from least occurring to most occurring
(Promise<Record<string, Array<string>>>).
top()
Get top trigrams to occurrence counts.
Returns
Returns a promise resolving to an object mapping
[UDHR in Unicode][efele-udhr]
codes to objects mapping the top 300 trigrams to occurrence counts
(Promise<Record<string, Record<string, number>>>).
Data
The trigrams are based on the [unicode][efele-udhr] versions of the [universal declaration of human rights][ohchr-udhr].
The files are created from all paragraphs made available by
[wooorm/udhr][github-wooorm-udhr] and do not include headings and such.
Before creating trigrams,
- the unicode characters from
\u0021to\u0040(both including) are removed - one or more white space characters (
\s+) are replaced with a single space - alphabetic characters are lower cased (
[A-Z])
Additionally, the input is padded with two spaces on both sides.
<!--support start-->| Code | Name |
| - | - |
| 007 | Sãotomense |
| 008 | Crioulo, Upper Guinea (008) |
| 009 | Mbundu (009) |
| 010 | Tetun Dili |
| 011 | Umbundu (011) |
| 013 | (Mijisa) |
| 014 | (Maiunan) |
| 016 | (Minjiang, spoken) |
| 017 | (Minjiang, written) |
| 020 | Drung |
| 021 | (Muzzi) |
| 022 | (Klau) |
| 025 | (Bizisa) |
| 026 | (Yeonbyeon) |
| 027 | Gumuz |
| 028 | Kafa |
| 029 | Sidamo |
| 030 | Kituba (2) |
| 032 | South Azerbaijani |
| 041 | Latvian (2) |
| 042 | Spanish (resolution) |
| 043 | Zarma |
| 044 | Mirandese |
| 045 | Maasai |
| 046 | Malay, Papuan |
| 047 | Malay, Ambonese |
| 048 | Minangkabau (2) |
| 049 | Banjar |
| 050 | (Bataknese) |
| 052 | Morisyen |
| 053 | Hausa (2) |
| 054 | Catalan (2) |
| 055 | Jamaican Creole English |
| 056 | Saint Lucian Creole French |
| 057 | Maay |
| 058 | Somali (Af Marka) |
| 059 | North Saami (2) |
| 060 | Inari Saami |
| 061 | Skolt Saami |
| 062 | Swahili (Chimwiini) |
| 063 | Swahili (Kibajuni) |
| 064 | Dabarre |
| 065 | Garre |
| 066 | Jiiddu |
| 067 | Finnish (2) |
| 068 | French (Welche) |
| 069 | Maori (2) |
| 071 | Kabyle |
| aar | Afar |
| abk | Abkhaz |
| ace | Aceh |
| acu | Achuar-Shiwiar |
| acu_1 | Achuar-Shiwiar (1) |
| ada | Dangme |
| ady | Adyghe |
| afr | Afrikaans |
| agr | Aguaruna |
| aii | Assyrian Neo-Aramaic |
| ajg | Aja |
| aka_akuapem | Twi (Akuapem) |
| aka_asante | Twi (Asante) |
| aka_fante | Fante |
| als | Albanian, Tosk |
| alt | Altai, Southern |
| amc | Amahuaca |
| ame | Yaneshaʼ |
| amh | Amharic |
| ami | Amis |
| amr | Amarakaeri |
| arb | Arabic, Standard |
| arl | Arabela |
| arn | Mapudungun |
| ast | Asturian |
| auc | Waorani |
| auv | Occitan (Auvergnat) |
| ayo | Ayoreo |
| ayr | Aymara, Central |
| azj_cyrl | Azerbaijani, North (Cyrillic) |
| azj_latn | Azerbaijani, North (Latin) |
| bam | [Bamanankan](http
Related Skills
node-connect
345.9kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
106.4kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
345.9kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
345.9kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
