JamSpell
Modern spell checking library - accurate, fast, multi-language
Install / Use
/learn @bakwc/JamSpellREADME
JamSpell
JamSpell is a spell checking library with following features:
- accurate - it considers words surroundings (context) for better correction
- fast - near 5K words per second
- multi-language - it's written in C++ and available for many languages with swig bindings
JamSpellPro
jamspell.com - check out a new jamspell version with following features
- Improved accuracy (catboost gradient boosted decision trees candidates ranking model)
- Splits merged words
- Pre-trained models for many languages (small, medium, large) for:
en, ru, de, fr, it, es, tr, uk, pl, nl, pt, hi, no - Ability to add words / sentences at runtime
- Fine-tuning / additional training
- Memory optimization for training large models
- Static dictionary support
- Built-in
Java, C#, Rubysupport - Windows support
Content
Benchmarks
<table> <tr> <td></td> <td>Errors</td> <td>Top 7 Errors</td> <td>Fix Rate</td> <td>Top 7 Fix Rate</td> <td>Broken</td> <td>Speed<br> (words/second)</td> </tr> <tr> <td>JamSpell</td> <td>3.25%</td> <td>1.27%</td> <td>79.53%</td> <td>84.10%</td> <td>0.64%</td> <td>4854</td> </tr> <tr> <td>Norvig</td> <td>7.62%</td> <td>5.00%</td> <td>46.58%</td> <td>66.51%</td> <td>0.69%</td> <td>395</td> </tr> <tr> <td>Hunspell</td> <td>13.10%</td> <td>10.33%</td> <td>47.52%</td> <td>68.56%</td> <td>7.14%</td> <td>163</td> </tr> <tr> <td>Dummy</td> <td>13.14%</td> <td>13.14%</td> <td>0.00%</td> <td>0.00%</td> <td>0.00%</td> <td>-</td> </tr> </table>Model was trained on 300K wikipedia sentences + 300K news sentences (english). 95% was used for train, 5% was used for evaluation. Errors model was used to generate errored text from the original one. JamSpell corrector was compared with Norvig's one, Hunspell and a dummy one (no corrections).
We used following metrics:
- Errors - percent of words with errors after spell checker processed
- Top 7 Errors - percent of words missing in top7 candidated
- Fix Rate - percent of errored words fixed by spell checker
- Top 7 Fix Rate - percent of errored words fixed by one of top7 candidates
- Broken - percent of non-errored words broken by spell checker
- Speed - number of words per second
To ensure that our model is not too overfitted for wikipedia+news we checked it on "The Adventures of Sherlock Holmes" text:
<table> <tr> <td></td> <td>Errors</td> <td>Top 7 Errors</td> <td>Fix Rate</td> <td>Top 7 Fix Rate</td> <td>Broken</td> <td>Speed (words per second)</td> </tr> <tr> <td>JamSpell</td> <td>3.56%</td> <td>1.27%</td> <td>72.03%</td> <td>79.73%</td> <td>0.50%</td> <td>5524</td> </tr> <tr> <td>Norvig</td> <td>7.60%</td> <td>5.30%</td> <td>35.43%</td> <td>56.06%</td> <td>0.45%</td> <td>647</td> </tr> <tr> <td>Hunspell</td> <td>9.36%</td> <td>6.44%</td> <td>39.61%</td> <td>65.77%</td> <td>2.95%</td> <td>284</td> </tr> <tr> <td>Dummy</td> <td>11.16%</td> <td>11.16%</td> <td>0.00%</td> <td>0.00%</td> <td>0.00%</td> <td>-</td> </tr> </table>More details about reproducing available in "Train" section.
Usage
Python
-
Install
swig3(usually it is in your distro package manager) -
Install
jamspell:
pip install jamspell
import jamspell
corrector = jamspell.TSpellCorrector()
corrector.LoadLangModel('en.bin')
corrector.FixFragment('I am the begt spell cherken!')
# u'I am the best spell checker!'
corrector.GetCandidates(['i', 'am', 'the', 'begt', 'spell', 'cherken'], 3)
# (u'best', u'beat', u'belt', u'bet', u'bent', ... )
corrector.GetCandidates(['i', 'am', 'the', 'begt', 'spell', 'cherken'], 5)
# (u'checker', u'chicken', u'checked', u'wherein', u'coherent', ...)
C++
-
Add
jamspellandcontribdirs to your project -
Use it:
#include <jamspell/spell_corrector.hpp>
int main(int argc, const char** argv) {
NJamSpell::TSpellCorrector corrector;
corrector.LoadLangModel("model.bin");
corrector.FixFragment(L"I am the begt spell cherken!");
// "I am the best spell checker!"
corrector.GetCandidates({L"i", L"am", L"the", L"begt", L"spell", L"cherken"}, 3);
// "best", "beat", "belt", "bet", "bent", ... )
corrector.GetCandidates({L"i", L"am", L"the", L"begt", L"spell", L"cherken"}, 3);
// "checker", "chicken", "checked", "wherein", "coherent", ... )
return 0;
}
Other languages
You can generate extensions for other languages using swig tutorial. The swig interface file is jamspell.i. Pull requests with build scripts are welcome.
HTTP API
-
Install
cmake -
Clone and build jamspell (it includes http server):
git clone https://github.com/bakwc/JamSpell.git
cd JamSpell
mkdir build
cd build
cmake ..
make
./web_server/web_server en.bin localhost 8080
- GET Request example:
$ curl "http://localhost:8080/fix?text=I am the begt spell cherken"
I am the best spell checker
- POST Request example
$ curl -d "I am the begt spell cherken" http://localhost:8080/fix
I am the best spell checker
- Candidate example
curl "http://localhost:8080/candidates?text=I am the begt spell cherken"
# or
curl -d "I am the begt spell cherken" http://localhost:8080/candidates
{
"results": [
{
"candidates": [
"best",
"beat",
"belt",
"bet",
"bent",
"beet",
"beit"
],
"len": 4,
"pos_from": 9
},
{
"candidates": [
"checker",
"chicken",
"checked",
"wherein",
"coherent",
"cheered",
"cherokee"
],
"len": 7,
"pos_from": 20
}
]
}
Here pos_from - misspelled word first letter position, len - misspelled word len
Train
To train custom model you need:
-
Install
cmake -
Clone and build jamspell:
git clone https://github.com/bakwc/JamSpell.git
cd JamSpell
mkdir build
cd build
cmake ..
make
-
Prepare a utf-8 text file with sentences to train at (eg.
sherlockholmes.txt) and another file with language alphabet (eg.alphabet_en.txt) -
Train model:
./main/jamspell train ../test_data/alphabet_en.txt ../test_data/sherlockholmes.txt model_sherlock.bin
- To evaluate spellchecker you can use
evaluate/evaluate.pyscript:
python evaluate/evaluate.py -a alphabet_file.txt -jsp your_model.bin -mx 50000 your_test_data.txt
- You can use
evaluate/generate_dataset.pyto generate you train/test data. It supports txt files, Leipzig Corpora Collection format and fb2 books.
Download models
Here is a few simple models. They trained on 300K news + 300k wikipedia sentences. We strongly recommend to train your own model, at least on a few million sentences to achieve better quality. See Train section above.
Related Skills
node-connect
336.9kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
claude-opus-4-5-migration
83.0kMigrate prompts and code from Claude Sonnet 4.0, Sonnet 4.5, or Opus 4.1 to Opus 4.5
frontend-design
83.0kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
model-usage
336.9kUse CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.
