JamSpell

Modern spell checking library - accurate, fast, multi-language

Generate Convert Improve

Install / Use

/learn @bakwc/JamSpell

About this skill

Quality Score

0/100

README

JamSpell

JamSpell is a spell checking library with following features:

accurate - it considers words surroundings (context) for better correction
fast - near 5K words per second
multi-language - it's written in C++ and available for many languages with swig bindings

Colab example

JamSpellPro

jamspell.com - check out a new jamspell version with following features

Improved accuracy (catboost gradient boosted decision trees candidates ranking model)
Splits merged words
Pre-trained models for many languages (small, medium, large) for:
en, ru, de, fr, it, es, tr, uk, pl, nl, pt, hi, no
Ability to add words / sentences at runtime
Fine-tuning / additional training
Memory optimization for training large models
Static dictionary support
Built-in Java, C#, Ruby support
Windows support

Content

Benchmarks
Usage
- Python
- C++
- Other languages
- HTTP API
Train

Benchmarks

<table> <tr> <td></td> <td>Errors</td> <td>Top 7 Errors</td> <td>Fix Rate</td> <td>Top 7 Fix Rate</td> <td>Broken</td> <td>Speed<br> (words/second)</td> </tr> <tr> <td>JamSpell</td> <td>3.25%</td> <td>1.27%</td> <td>79.53%</td> <td>84.10%</td> <td>0.64%</td> <td>4854</td> </tr> <tr> <td>Norvig</td> <td>7.62%</td> <td>5.00%</td> <td>46.58%</td> <td>66.51%</td> <td>0.69%</td> <td>395</td> </tr> <tr> <td>Hunspell</td> <td>13.10%</td> <td>10.33%</td> <td>47.52%</td> <td>68.56%</td> <td>7.14%</td> <td>163</td> </tr> <tr> <td>Dummy</td> <td>13.14%</td> <td>13.14%</td> <td>0.00%</td> <td>0.00%</td> <td>0.00%</td> <td>-</td> </tr> </table>

Model was trained on 300K wikipedia sentences + 300K news sentences (english). 95% was used for train, 5% was used for evaluation. Errors model was used to generate errored text from the original one. JamSpell corrector was compared with Norvig's one, Hunspell and a dummy one (no corrections).

We used following metrics:

Errors - percent of words with errors after spell checker processed
Top 7 Errors - percent of words missing in top7 candidated
Fix Rate - percent of errored words fixed by spell checker
Top 7 Fix Rate - percent of errored words fixed by one of top7 candidates
Broken - percent of non-errored words broken by spell checker
Speed - number of words per second

To ensure that our model is not too overfitted for wikipedia+news we checked it on "The Adventures of Sherlock Holmes" text:

<table> <tr> <td></td> <td>Errors</td> <td>Top 7 Errors</td> <td>Fix Rate</td> <td>Top 7 Fix Rate</td> <td>Broken</td> <td>Speed (words per second)</td> </tr> <tr> <td>JamSpell</td> <td>3.56%</td> <td>1.27%</td> <td>72.03%</td> <td>79.73%</td> <td>0.50%</td> <td>5524</td> </tr> <tr> <td>Norvig</td> <td>7.60%</td> <td>5.30%</td> <td>35.43%</td> <td>56.06%</td> <td>0.45%</td> <td>647</td> </tr> <tr> <td>Hunspell</td> <td>9.36%</td> <td>6.44%</td> <td>39.61%</td> <td>65.77%</td> <td>2.95%</td> <td>284</td> </tr> <tr> <td>Dummy</td> <td>11.16%</td> <td>11.16%</td> <td>0.00%</td> <td>0.00%</td> <td>0.00%</td> <td>-</td> </tr> </table>

More details about reproducing available in "Train" section.

Usage

Python

Install swig3 (usually it is in your distro package manager)
Install jamspell:

pip install jamspell

Download or train language model
Use it:

import jamspell

corrector = jamspell.TSpellCorrector()
corrector.LoadLangModel('en.bin')

corrector.FixFragment('I am the begt spell cherken!')
# u'I am the best spell checker!'

corrector.GetCandidates(['i', 'am', 'the', 'begt', 'spell', 'cherken'], 3)
# (u'best', u'beat', u'belt', u'bet', u'bent', ... )

corrector.GetCandidates(['i', 'am', 'the', 'begt', 'spell', 'cherken'], 5)
# (u'checker', u'chicken', u'checked', u'wherein', u'coherent', ...)

C++

Add jamspell and contrib dirs to your project
Use it:

#include <jamspell/spell_corrector.hpp>

int main(int argc, const char** argv) {

    NJamSpell::TSpellCorrector corrector;
    corrector.LoadLangModel("model.bin");

    corrector.FixFragment(L"I am the begt spell cherken!");
    // "I am the best spell checker!"

    corrector.GetCandidates({L"i", L"am", L"the", L"begt", L"spell", L"cherken"}, 3);
    // "best", "beat", "belt", "bet", "bent", ... )

    corrector.GetCandidates({L"i", L"am", L"the", L"begt", L"spell", L"cherken"}, 3);
    // "checker", "chicken", "checked", "wherein", "coherent", ... )
    return 0;
}

Other languages

You can generate extensions for other languages using swig tutorial. The swig interface file is jamspell.i. Pull requests with build scripts are welcome.

HTTP API

Install cmake
Clone and build jamspell (it includes http server):

git clone https://github.com/bakwc/JamSpell.git
cd JamSpell
mkdir build
cd build
cmake ..
make

Download or train language model
Run http server:

./web_server/web_server en.bin localhost 8080

GET Request example:

$ curl "http://localhost:8080/fix?text=I am the begt spell cherken"
I am the best spell checker

POST Request example

$ curl -d "I am the begt spell cherken" http://localhost:8080/fix
I am the best spell checker

Candidate example

curl "http://localhost:8080/candidates?text=I am the begt spell cherken"
# or
curl -d "I am the begt spell cherken" http://localhost:8080/candidates

{
    "results": [
        {
            "candidates": [
                "best",
                "beat",
                "belt",
                "bet",
                "bent",
                "beet",
                "beit"
            ],
            "len": 4,
            "pos_from": 9
        },
        {
            "candidates": [
                "checker",
                "chicken",
                "checked",
                "wherein",
                "coherent",
                "cheered",
                "cherokee"
            ],
            "len": 7,
            "pos_from": 20
        }
    ]
}

Here pos_from - misspelled word first letter position, len - misspelled word len

Train

To train custom model you need:

Install cmake
Clone and build jamspell:

git clone https://github.com/bakwc/JamSpell.git
cd JamSpell
mkdir build
cd build
cmake ..
make

Prepare a utf-8 text file with sentences to train at (eg. sherlockholmes.txt) and another file with language alphabet (eg. alphabet_en.txt)
Train model:

./main/jamspell train ../test_data/alphabet_en.txt ../test_data/sherlockholmes.txt model_sherlock.bin

To evaluate spellchecker you can use evaluate/evaluate.py script:

python evaluate/evaluate.py -a alphabet_file.txt -jsp your_model.bin -mx 50000 your_test_data.txt

You can use evaluate/generate_dataset.py to generate you train/test data. It supports txt files, Leipzig Corpora Collection format and fb2 books.

Download models

Here is a few simple models. They trained on 300K news + 300k wikipedia sentences. We strongly recommend to train your own model, at least on a few million sentences to achieve better quality. See Train section above.

en.tar.gz (35Mb)
fr.tar.gz (31Mb)
ru.tar.gz (38Mb)

Related Skills

node-connect

336.9k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

claude-opus-4-5-migration

83.0k

Migrate prompts and code from Claude Sonnet 4.0, Sonnet 4.5, or Opus 4.1 to Opus 4.5

frontend-design

83.0k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

model-usage

336.9k

Use CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.

bakwc

View profile

View on GitHub

GitHub Stars658

CategoryDevelopment

Updated9d ago

Forks110

bakwc/JamSpell

Languages

C++

Security Score

100/100

Audited on Mar 17, 2026

No findings