Kuromojin
Provide a high-level wrapper for kuromoji.js. Cache/Promise API
Install / Use
/learn @azu/KuromojinREADME
kuromojin 
Provide a high level wrapper for kuromoji.js.
Features
- Promise based API
- Cache Layer
- Fetch the dictionary at once
- Return same tokens for same text
Installation
npm install kuromojin
Online Playground
📝 Require DecompressionStream supported browser
Usage
Export two API.
getTokenizer()returnPromisethat is resolved with kuromoji.js'stokenizerinstance.tokenize()returnPromisethat is resolved with analyzed tokens.- The array and objects returned by
tokenize()are read-only to ensure immutability and prevent modification of cached data.
import {tokenize, getTokenizer} from "kuromojin";
getTokenizer().then(tokenizer => {
// kuromoji.js's `tokenizer` instance
});
tokenize(text).then(tokens => {
console.log(tokens)
/*
[ {
word_id: 509800, // 辞書内での単語ID
word_type: 'KNOWN', // 単語タイプ(辞書に登録されている単語ならKNOWN, 未知語ならUNKNOWN)
word_position: 1, // 単語の開始位置
surface_form: '黒文字', // 表層形
pos: '名詞', // 品詞
pos_detail_1: '一般', // 品詞細分類1
pos_detail_2: '*', // 品詞細分類2
pos_detail_3: '*', // 品詞細分類3
conjugated_type: '*', // 活用型
conjugated_form: '*', // 活用形
basic_form: '黒文字', // 基本形
reading: 'クロモジ', // 読み
pronunciation: 'クロモジ' // 発音
} ]
*/
});
For browser/global options
If window.kuromojin.dicPath is defined, kuromojin use it as default dict path.
import {getTokenizer} from "kuromojin";
// Affect all module that are used kuromojin.
window.kuromojin = {
dicPath: "https://cdn.jsdelivr.net/npm/kuromoji@0.1.2/dict"
};
// this `getTokenizer` function use "https://kuromojin.netlify.com/dict"
getTokenizer();
// ===
getTokenizer({dicPath: "https://cdn.jsdelivr.net/npm/kuromoji@0.1.2/dict"})
:memo: Test dictionary URL
- "https://cdn.jsdelivr.net/npm/kuromoji@0.1.2/dict"
- cdn dict for kuromoji.js
- https://kuromojin.netlify.com/dict/*.dat.gz
- example: https://kuromojin.netlify.com/dict/base.dat.gz
Note: backward compatibility for <= 1.1.0
kuromojin v1.1.0 export tokenize as default function.
kuromojin v2.0.0 remove the default function.
import kuromojin from "kuromojin";
// kuromojin === tokenize
Recommended: use import {tokenize} from "kuromojin" instead of it
import {tokenize} from "kuromojin";
Note: kuromoji version is pinned
kuromojin pin kuromoji's version.
It aim to dedupe kuromoji's dictionary. The dictionary is large and avoid to duplicated dictionary.
Related
Tests
npm test
Contributing
- Fork it!
- Create your feature branch:
git checkout -b my-new-feature - Commit your changes:
git commit -am 'Add some feature' - Push to the branch:
git push origin my-new-feature - Submit a pull request :D
License
MIT
Related Skills
node-connect
354.5kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
112.4kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
354.5kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
354.5kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
