Jitenbot
Web crawler for creating personal copies of Japanese dictionaries
Install / Use
/learn @KonstantinDjairo/JitenbotREADME
jitenbot
Jitenbot is a program for scraping Japanese dictionary websites and compiling the scraped data into compact dictionary file formats.
Supported Dictionaries
- Web Dictionaries
- 国語辞典オンライン (
jitenon-kokugo) - 四字熟語辞典オンライン (
jitenon-yoji) - 故事・ことわざ・慣用句オンライン (
jitenon-kotowaza)
- 国語辞典オンライン (
- Monokakido
- 新明解国語辞典 第八版 (
smk8) - 大辞林 第四版 (
daijirin2) - 三省堂国語辞典 第八版 (
sankoku8)
- 新明解国語辞典 第八版 (
Supported Output Formats
- Yomichan
- MDict (.MDX & .MDD)
Examples
<details> <summary>Jitenon Kokugo (web | yomichan)</summary>




Usage
usage: jitenbot [-h] [-p PAGE_DIR] [-m MEDIA_DIR] [-i MDICT_ICON]
[--no-mdict-export] [--no-yomichan-export]
[--validate-yomichan-terms]
{jitenon-kokugo,jitenon-yoji,jitenon-kotowaza,smk8,daijirin2,sankoku8}
Convert Japanese dictionary files to new formats.
positional arguments:
{jitenon-kokugo,jitenon-yoji,jitenon-kotowaza,smk8,daijirin2,sankoku8}
name of dictionary to convert
options:
-h, --help show this help message and exit
-p PAGE_DIR, --page-dir PAGE_DIR
path to directory containing XML page files
-m MEDIA_DIR, --media-dir MEDIA_DIR
path to directory containing media folders (gaiji,
graphics, audio, etc.)
-i MDICT_ICON, --mdict-icon MDICT_ICON
path to icon file to be used with MDict
--no-mdict-export skip export of dictionary data to MDict format
--no-yomichan-export skip export of dictionary data to Yomichan format
--validate-yomichan-terms
validate JSON structure of exported Yomichan
dictionary terms
See README.md for details regarding media directory structures
Web Targets
Jitenbot will scrape the target website and save the pages to the user cache directory. As a courtesy to the website owners, jitenbot is configured to pause for 10 seconds between each page request. Consequently, a complete crawl of a target website may take several days.
HTTP request headers (user agent string, etc.) may be customized by editing the config.json file created in the
user config directory.
Monokakido Targets
These digital dictionaries are available for purchase through the Monokakido Dictionaries app on MacOS/iOS. Under ideal circumstances, Jitenbot would be able to automatically fetch all the data it needs from this app's data directory[^1] on your system. In its current state of development, Jitenbot unfortunately requires you to find and assemble the necessary data yourself. The files must be organized into a particular folder structure (defined below) and then passed to Jitenbot via the corresponding command line arguments.
Some of the folders in the app's data directory[^1] contain encoded files that must be unencoded using golddranks' monokakido tool. These folders are indicated by a reference mark (※) in the notes below.
[^1]: /Library/Application Support/AppStoreContent/jp.monokakido.Dictionaries/Products/
Since Yomichan does not support audio files from imported dictionaries, the audio/ directory may be omitted to save filesize space in the output ZIP file if desired.
.
├── media
│ ├── audio (※)
│ │ ├── 00001.aac
│ │ ├── 00002.aac
│ │ ├── 00003.aac
│ │ ├── ...
│ │ └── 82682.aac
│ ├── Audio.png
│ └── gaiji
│ ├── 1d110.svg
│ ├── 1d15d.svg
│ ├── 1d15e.svg
│ ├── ...
│ └── xbunnoa.svg
└── pages (※)
├── 0000000000.xml
├── 0000000001.xml
├── 0000000002.xml
├── ...
└── 0000064581.xml
</details>
<details>
<summary>daijirin2 files</summary>
The graphics/ directory may be omitted to save space if desired.
.
├── media
│ ├── gaiji
│ │ ├── 1D10B.svg
│ │ ├── 1D110.svg
│ │ ├── 1D12A.svg
│ │ ├── ...
│ │ └── vectorOB.svg
│ └── graphics (※)
│ ├── 3djr_0002.png
│ ├── 3djr_0004.png
│ ├── 3djr_0005.png
│ ├── ...
│ └── 4djr_yahazu.png
└── pages (※)
├── 0000000001.xml
├── 0000000002.xml
├── 0000000003.xml
├── ...
└── 0000182633.xml
</details>
<details>
<summary>sankoku8 files</summary>
.
├── media
│ ├── graphics
│ │ ├── 000chouchou.png
│ │ ├── ...
│ │ └── 888udatsu.png
│ ├── svg-accent
│ │ ├── アクセント.svg
│ │ └── 平板.svg
│ ├── svg-frac
│ │ ├── frac-1-2.svg
│ │ ├── ...
│ │ └── frac-a-b.svg
│ ├── svg-gaiji
│ │ ├── aiaigasa.svg
│ │ ├── ...
│ │ └── 異体字_西.svg
│ ├── svg-intonation
│ │ ├── 上昇下降.svg
│ │ ├── ...
│ │ └── 長.svg
│ ├── svg-logo
│ │ ├── denshi.svg
│ │ ├── ...
│ │ └── 重要語.svg
│ └── svg-special
│ └── 区切り線.svg
└── pages (※)
├── 0000000001.xml
├── ...
└── 0000065457.xml
</details>
Attribution
Adobe-Japan1_sequences.txt is provided by The Adobe-Japan1-7 Character Collection.
The Yomichan term-bank schema definition dictionary-term-bank-v3-schema.json is provided by the Yomichan project.
Many thanks to epistularum for providing thoughtful feedback regarding the implementation of the MDict export functionality.
Related Skills
node-connect
342.5kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
85.3kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
342.5kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
342.5kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
