MinnaNoDS
All of the vocabulary from your Minna no Nihongo Shokyū I & II textbooks formulated into a tiny little YAML file.
Install / Use
/learn @vitto4/MinnaNoDSREADME
MinnaNoDS
<p align="center"> <a href="https://yaml.org/"> <img alt="YouTube" src="https://img.shields.io/badge/YAML-CB171E?logo=yaml&logoColor=fff&style=flat-square" /></a> <a href="https://en.wikipedia.org/wiki/Japanese_language"> <img alt="Python" src="https://img.shields.io/badge/lang-%20%E6%97%A5%E6%9C%AC%E8%AA%9E-forestgreen?style=flat-square" /></a> <a href="https://github.com/vitto4/MinnaNoDS/releases"> <img alt="GitHub Release" src="https://img.shields.io/github/v/release/vitto4/MinnaNoDS?style=flat-square" /></a> </p> <p align="center">All of the vocabulary from your <code>Minna no Nihongo Shokyū Ⅰ & Ⅱ</code> textbooks formulated into a tiny little YAML file.</p> <br> - id: [2, 10]
edition: [1, 2]
kanji: "新聞"
kana: "しんぶん"
romaji: "shinbun"
meaning: {
en: "newspaper",
fr: "journal",
}
<p align="center"><sup><ins><i>Figure</i></ins> – A word straight from the dataset. More information on the format <a href="https://github.com/vitto4/MinnaNoDS/blob/ae20b1408e2752642618312728ff817fe2479dd4/minna-no-ds.yaml#L93-L107">this way</a>.</sup></p>
🧭 Table of contents
☁ Overview
This project aims to serve as a comprehensive vocabulary list for the Minna no Nihongo Shokyū series, packaged into a single YAML file – which we'll refer to as the dataset. <br>
To be more specific, it intends to be as close as possible to the source material, in an effort to (hopefully) provide a foundation anyone can use or expand on.
The dataset currently targets two languages for meaning :
languages:
en: "English"
fr: "Français"
<p align="center"><sup> Further information <a href="https://github.com/vitto4/MinnaNoDS/blob/ae20b1408e2752642618312728ff817fe2479dd4/minna-no-ds.yaml#L26-L31">here</a>.</sup></p>
Rōmaji are provided solely for convenience, and do not correspond to those of the rōmaji edition of the books. <br>
These were generated using a mix of pykakasi and readings supplied by Google Translate. As a result, they more or less follow standards set by the Modified Hepburn system (yes, mācrōns inclūdēd !).
⚙️ Usage
Here is a basic example in python.
import yaml
# Load the dataset
with open("minna-no-ds.yaml", "r", encoding="utf-8") as f:
ds = yaml.load(f, Loader=yaml.FullLoader)
# Extract the keys for all available lessons
lessons: list = [lesson["key"] for lesson in ds["lessons"]] # ['lesson-01', 'lesson-02', ...]
# Go through each lesson and print out its contents
for key in lessons:
print(f"Contents of {key}") # Outputs : Contents of lesson-01
print(ds[key]) # Outputs : [{'id': [1, 1], 'edition': [1, 2], 'kanji': None, 'kana': 'わたし', 'romaji': 'watashi', 'meaning': {'en': 'I', 'fr': 'je, moi'}}, ...]
📚 Bibliography
As you may know, Minna no Nihongo Shokyū comes in two books of twenty-five lessons each ; both available in two editions (the second of which is an updated version of the original).
Presented bellow is a table showing the books used in the making of the dataset.
| 📗📘📙 | First Edition | Second Edition |
|:-----:|:-------------:|:--------------:|
| Book 1<br>English Version | みんなの日本語初級Ⅰ 翻訳・文法解説英語版<br>ISBN : 9784883191079 | みんなの日本語初級Ⅰ 第2版 翻訳・文法解説 英語版<br>ISBN : 9784883196043 |
| Book 2<br>English Version | みんなの日本語初級Ⅱ 翻訳・文法解説英語版<br>ISBN : 9784883191086 | みんなの日本語初級Ⅱ 第2版 翻訳・文法解説 英語版<br>ISBN : 9784883196647 |
| Book 1<br>French Version | みんなの日本語初級Ⅰ 翻訳・文法解説フランス語版<br>ISBN : 9784883191338| みんなの日本語初級Ⅰ 第2版 翻訳・文法解説 フランス語版<br>ISBN : 9784883196456 |
| Book 2<br>French Version | みんなの日本語初級Ⅱ 翻訳・文法解説フランス語版<br>ISBN : 9784883191383 | みんなの日本語初級Ⅱ 第2版 翻訳・文法解説 フランス語版<br>ISBN : 9784883197057 |
🚦 Conventions
What I call a convention is any rule I set while creating the dataset that is not directly derived from the source material.
<p align="center"> See <a href="https://github.com/vitto4/MinnaNoDS/blob/main/CONVENTIONS.md"> <code>CONVENTIONS.md</code> </a> </p>This file also includes general information about the structure of the dataset.
🔖 Notes
- When starting out with this project, I used Paul Denisowski's vocabulary lists to generate a blank template for me to fill in. Serious time-saver right there !
- As strings in
romajido not need to be spellchecked, you may use the following config withCSpell."cSpell.ignoreRegExpList": [ "/romaji:\\s*\"[^\"]*\"/gi" ] - This project should have shipped with the set of scripts I used to lint and validate the dataset. <br> It didn't, but who knows, I may get to it when (if) I stop being obsessed with that one space dwarves simulator ¯\_(ツ)_/¯
- Adding or removing a word will alter the
idof all subsequent words in the same lesson. <br> Therefore, any time this has to be done, the version number will have to be bumped to the next major release as this could be considered breaking change for anyone usingidas a primary key. - « This must have taken quite the amount of time to make » well you don't say ! (笑) <br> Though I think I'm happy with how it turned out c:
- ~~I haven't yet managed to get my hands on a French version of the first edition of book 1, so words found exclusively in
Book 1, edition 1have no Frenchmeaningfor now~~ — fixed as of v1.1.0.
🚧 Warning
# * The selection of words and their respective translations are the sole property of 3A Corporation.
# This dataset and subsequent projects that depend on it shall only be used *in conjunction with* – and not *as a substitute for* – the books ; so as to not cause any financial harm to the IP owners.
# * As per previous remarks, no commercial use of this file shall be admissible.
<p align="center"><sup> More <a href="https://github.com/vitto4/MinnaNoDS/blob/ae20b1408e2752642618312728ff817fe2479dd4/minna-no-ds.yaml#L12-L14">here</a>.</sup></p>
The lack of license is deliberate, as I am uncertain about the appropriate licensing options for this project. Content isn't mine, only the dataset structure and the actual work of filling it in. <br> If you know a suitable option, feel free to open an issue !
Hopefully that doesn't stop anyone from using the dataset though.
Related Skills
YC-Killer
2.7kA library of enterprise-grade AI agents designed to democratize artificial intelligence and provide free, open-source alternatives to overvalued Y Combinator startups. If you are excited about democratizing AI access & AI agents, please star ⭐️ this repository and use the link in the readme to join our open source AI research team.
best-practices-researcher
The most comprehensive Claude Code skills registry | Web Search: https://skills-registry-web.vercel.app
groundhog
399Groundhog's primary purpose is to teach people how Cursor and all these other coding agents work under the hood. If you understand how these coding assistants work from first principles, then you can drive these tools harder (or perhaps make your own!).
last30days-skill
10.3kAI agent skill that researches any topic across Reddit, X, YouTube, HN, Polymarket, and the web - then synthesizes a grounded summary
Security Score
Audited on Mar 20, 2026
