Neuspell

NeuSpell: A Neural Spelling Correction Toolkit

Generate Convert Improve

Install / Use

/learn @neuspell/Neuspell

About this skill

Quality Score

0/100

README

<h1 align="center"> NeuSpell: A Neural Spelling Correction Toolkit </h1>

Installation & Quick Start
Toolkit
Finetuning on custom data and creating new models
Applications
Additional Requirements

Updates

Latest

April 2021:
- APIs for creating synthetic data now available for English language. See Synthetic data creation.
- neuspell is now available through pip. See Installation through pip
- Added support for different transformer-based models such DistilBERT, XLM-RoBERTa, etc. See Finetuning on custom data and creating new models section for more details.

March, 2021:
- Code-base reformatted. Addressed bug fixes and issues.
November, 2020:
- Neuspell's BERT pretrained model is now available as part of huggingface models as murali1996/bert-base-cased-spell-correction. We provide an example code snippet at ./scripts/huggingface for curious practitioners.
September, 2020:
- This work is accepted at EMNLP 2020 (system demonstrations)

Installation

git clone https://github.com/neuspell/neuspell; cd neuspell
pip install -e .

To install extra requirements,

pip install -r extras-requirements.txt

or individually as:

pip install -e .[elmo]
pip install -e .[spacy]

NOTE: For zsh, use ".[elmo]" and ".[spacy]" instead

Additionally, spacy models can be downloaded as:

python -m spacy download en_core_web_sm

Then, download pretrained models of neuspell following Download Checkpoints

Here is a quick-start code snippet (command line usage) to use a checker model. See test_neuspell_correctors.py for more usage patterns.

import neuspell
from neuspell import available_checkers, BertChecker

""" see available checkers """
print(f"available checkers: {neuspell.available_checkers()}")
# → available checkers: ['BertsclstmChecker', 'CnnlstmChecker', 'NestedlstmChecker', 'SclstmChecker', 'SclstmbertChecker', 'BertChecker', 'SclstmelmoChecker', 'ElmosclstmChecker']

""" select spell checkers & load """
checker = BertChecker()
checker.from_pretrained()

""" spell correction """
checker.correct("I luk foward to receving your reply")
# → "I look forward to receiving your reply"
checker.correct_strings(["I luk foward to receving your reply", ])
# → ["I look forward to receiving your reply"]
checker.correct_from_file(src="noisy_texts.txt")
# → "Found 450 mistakes in 322 lines, total_lines=350"

""" evaluation of models """
checker.evaluate(clean_file="bea60k.txt", corrupt_file="bea60k.noise.txt")
# → data size: 63044
# → total inference time for this data is: 998.13 secs
# → total token count: 1032061
# → confusion table: corr2corr:940937, corr2incorr:21060,
#                    incorr2corr:55889, incorr2incorr:14175
# → accuracy is 96.58%
# → word correction rate is 79.76%

Alternatively, once can also select and load a spell checker differently as follows:

from neuspell import SclstmChecker

checker = SclstmChecker()
checker = checker.add_("elmo", at="input")  # "elmo" or "bert", "input" or "output"
checker.from_pretrained()

This feature of adding ELMO or BERT model is currently supported for selected models. See List of neural models in the toolkit for details.

If interested, follow Additional Requirements for installing non-neural spell checkers- Aspell and Jamspell.

Installation through pip

pip install neuspell

In v1.0, allennlp library is not automatically installed which is used for models containing ELMO. Hence, to utilize those checkers, do a source install as in Installation & Quick Start

Toolkit

Introduction

NeuSpell is an open-source toolkit for context sensitive spelling correction in English. This toolkit comprises of 10 spell checkers, with evaluations on naturally occurring mis-spellings from multiple (publicly available) sources. To make neural models for spell checking context dependent, (i) we train neural models using spelling errors in context, synthetically constructed by reverse engineering isolated mis-spellings; and (ii) use richer representations of the context.This toolkit enables NLP practitioners to use our proposed and existing spelling correction systems, both via a simple unified command line, as well as a web interface. Among many potential applications, we demonstrate the utility of our spell-checkers in combating adversarial misspellings.

Live demo available at http://neuspell.github.io/

List of neural models in the toolkit:

<img src="https://github.com/neuspell/neuspell/blob/master/images/pipeline.jpeg?raw=true" width="400"/> This pipeline corresponds to the `SC-LSTM plus ELMO (at input)` model.

Performances

| Spell Checker | Word Correction Rate | Time per sentence (in milliseconds) | |-------------------------------------|-----------------------|--------------------------------------| | Aspell | 48.7 | 7.3* | | Jamspell | 68.9 | 2.6* | | CNN-LSTM | 75.8 | 4.2 | | SC-LSTM | 76.7 | 2.8 | | Nested-LSTM | 77.3 | 6.4 | | BERT | 79.1 | 7.1 | | SC-LSTM plus ELMO (at input) | 79.8 | 15.8 | | SC-LSTM plus ELMO (at output) | 78.5 | 16.3 | | SC-LSTM plus BERT (at input) | 77.0 | 6.7 | | SC-LSTM plus BERT (at output) | 76.0 | 7.2 |

Performance of different correctors in the NeuSpell toolkit on the BEA-60K dataset with real-world spelling mistakes. ∗ indicates evaluation on a CPU (for others we use a GeForce RTX 2080 Ti GPU).

Download Checkpoints

To download selected checkpoints, select a Checkpoint name from below and then run download. Each checkpoint is associated with a neural spell checker as shown in the table.

| Spell Checker | Class | Checkpoint name | Disk space (approx.) | |-------------------------------------|---------------------|-----------------------------|----------------------| | CNN-LSTM | CnnlstmChecker | 'cnn-lstm-probwordnoise' | 450 MB | | SC-LSTM | SclstmChecker | 'scrnn-probwordnoise' | 450 MB | | Nested-LSTM | NestedlstmChecker | 'lstm-lstm-probwordnoise' | 455 MB | | BERT | BertChecker | 'subwordbert-probwordnoise' | 740 MB | | SC-LSTM plus ELMO (at input) | ElmosclstmChecker | 'elmoscrnn-probwordnoise' | 840 MB | | SC-LSTM plus BERT (at input) | BertsclstmChecker | 'bertscrnn-probwordnoise' | 900 MB | | SC-LSTM plus BERT (at output) | SclstmbertChecker | 'scrnnbert-probwordnoise' | 1.19 GB | | SC-LSTM plus ELMO (at output) | SclstmelmoChecker | 'scrnnelmo-probwordnoise' | 1.23 GB |

import neuspell

neuspell.seq_modeling.downloads.download_pretrained_model("subwordbert-probwordnoise")

Alternatively, download all Neuspell neural models by running the following (available in versions after v1.0):

import neuspell

neuspell.seq_modeling.downloads.download_pretrained_model("_all_")

Alternatively,

Datasets

We curate several synthetic and natural datasets for training/evaluating neuspell models. For full details, check our paper. Run the following to

Related Skills

node-connect

336.2k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

82.8k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

336.2k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

commit-push-pr

82.8k

Commit, push, and open a PR

neuspell

View profile

View on GitHub

GitHub Stars711

CategoryDevelopment

Updated5d ago

Forks106

neuspell/neuspell

Languages

Python

Security Score

100/100

Audited on Mar 20, 2026

No findings