SkillAgentSearch skills...

Neuspell

NeuSpell: A Neural Spelling Correction Toolkit

Install / Use

/learn @neuspell/Neuspell

README

<h1 align="center"> <p>NeuSpell: A Neural Spelling Correction Toolkit </h1>

Contents

Updates

Latest

Previous

  • March, 2021:
    • Code-base reformatted. Addressed bug fixes and issues.
  • November, 2020:
    • Neuspell's BERT pretrained model is now available as part of huggingface models as murali1996/bert-base-cased-spell-correction. We provide an example code snippet at ./scripts/huggingface for curious practitioners.
  • September, 2020:
    • This work is accepted at EMNLP 2020 (system demonstrations)

Installation

git clone https://github.com/neuspell/neuspell; cd neuspell
pip install -e .

To install extra requirements,

pip install -r extras-requirements.txt

or individually as:

pip install -e .[elmo]
pip install -e .[spacy]

NOTE: For zsh, use ".[elmo]" and ".[spacy]" instead

Additionally, spacy models can be downloaded as:

python -m spacy download en_core_web_sm

Then, download pretrained models of neuspell following Download Checkpoints

Here is a quick-start code snippet (command line usage) to use a checker model. See test_neuspell_correctors.py for more usage patterns.

import neuspell
from neuspell import available_checkers, BertChecker

""" see available checkers """
print(f"available checkers: {neuspell.available_checkers()}")
# → available checkers: ['BertsclstmChecker', 'CnnlstmChecker', 'NestedlstmChecker', 'SclstmChecker', 'SclstmbertChecker', 'BertChecker', 'SclstmelmoChecker', 'ElmosclstmChecker']

""" select spell checkers & load """
checker = BertChecker()
checker.from_pretrained()

""" spell correction """
checker.correct("I luk foward to receving your reply")
# → "I look forward to receiving your reply"
checker.correct_strings(["I luk foward to receving your reply", ])
# → ["I look forward to receiving your reply"]
checker.correct_from_file(src="noisy_texts.txt")
# → "Found 450 mistakes in 322 lines, total_lines=350"

""" evaluation of models """
checker.evaluate(clean_file="bea60k.txt", corrupt_file="bea60k.noise.txt")
# → data size: 63044
# → total inference time for this data is: 998.13 secs
# → total token count: 1032061
# → confusion table: corr2corr:940937, corr2incorr:21060,
#                    incorr2corr:55889, incorr2incorr:14175
# → accuracy is 96.58%
# → word correction rate is 79.76%

Alternatively, once can also select and load a spell checker differently as follows:

from neuspell import SclstmChecker

checker = SclstmChecker()
checker = checker.add_("elmo", at="input")  # "elmo" or "bert", "input" or "output"
checker.from_pretrained()

This feature of adding ELMO or BERT model is currently supported for selected models. See List of neural models in the toolkit for details.

If interested, follow Additional Requirements for installing non-neural spell checkers- Aspell and Jamspell.

Installation through pip

pip install neuspell

In v1.0, allennlp library is not automatically installed which is used for models containing ELMO. Hence, to utilize those checkers, do a source install as in Installation & Quick Start

Toolkit

Introduction

NeuSpell is an open-source toolkit for context sensitive spelling correction in English. This toolkit comprises of 10 spell checkers, with evaluations on naturally occurring mis-spellings from multiple (publicly available) sources. To make neural models for spell checking context dependent, (i) we train neural models using spelling errors in context, synthetically constructed by reverse engineering isolated mis-spellings; and (ii) use richer representations of the context.This toolkit enables NLP practitioners to use our proposed and existing spelling correction systems, both via a simple unified command line, as well as a web interface. Among many potential applications, we demonstrate the utility of our spell-checkers in combating adversarial misspellings.

Live demo available at http://neuspell.github.io/
<p align="center"> <br> <img src="https://github.com/neuspell/neuspell/blob/master/images/ui.png?raw=true" width="400"/> <br> <p>
List of neural models in the toolkit:
<p align="center"> <br> <img src="https://github.com/neuspell/neuspell/blob/master/images/pipeline.jpeg?raw=true" width="400"/> <br> This pipeline corresponds to the `SC-LSTM plus ELMO (at input)` model. <p>
Performances

| Spell<br>Checker | Word<br>Correction <br>Rate | Time per<br>sentence <br>(in milliseconds) | |-------------------------------------|-----------------------|--------------------------------------| | Aspell | 48.7 | 7.3* | | Jamspell | 68.9 | 2.6* | | CNN-LSTM | 75.8 | 4.2 | | SC-LSTM | 76.7 | 2.8 | | Nested-LSTM | 77.3 | 6.4 | | BERT | 79.1 | 7.1 | | SC-LSTM plus ELMO (at input) | 79.8 | 15.8 | | SC-LSTM plus ELMO (at output) | 78.5 | 16.3 | | SC-LSTM plus BERT (at input) | 77.0 | 6.7 | | SC-LSTM plus BERT (at output) | 76.0 | 7.2 |

Performance of different correctors in the NeuSpell toolkit on the BEA-60K dataset with real-world spelling mistakes. ∗ indicates evaluation on a CPU (for others we use a GeForce RTX 2080 Ti GPU).

Download Checkpoints

To download selected checkpoints, select a Checkpoint name from below and then run download. Each checkpoint is associated with a neural spell checker as shown in the table.

| Spell Checker | Class | Checkpoint name | Disk space (approx.) | |-------------------------------------|---------------------|-----------------------------|----------------------| | CNN-LSTM | CnnlstmChecker | 'cnn-lstm-probwordnoise' | 450 MB | | SC-LSTM | SclstmChecker | 'scrnn-probwordnoise' | 450 MB | | Nested-LSTM | NestedlstmChecker | 'lstm-lstm-probwordnoise' | 455 MB | | BERT | BertChecker | 'subwordbert-probwordnoise' | 740 MB | | SC-LSTM plus ELMO (at input) | ElmosclstmChecker | 'elmoscrnn-probwordnoise' | 840 MB | | SC-LSTM plus BERT (at input) | BertsclstmChecker | 'bertscrnn-probwordnoise' | 900 MB | | SC-LSTM plus BERT (at output) | SclstmbertChecker | 'scrnnbert-probwordnoise' | 1.19 GB | | SC-LSTM plus ELMO (at output) | SclstmelmoChecker | 'scrnnelmo-probwordnoise' | 1.23 GB |

import neuspell

neuspell.seq_modeling.downloads.download_pretrained_model("subwordbert-probwordnoise")

Alternatively, download all Neuspell neural models by running the following (available in versions after v1.0):

import neuspell

neuspell.seq_modeling.downloads.download_pretrained_model("_all_")

Alternatively,

Datasets

We curate several synthetic and natural datasets for training/evaluating neuspell models. For full details, check our paper. Run the following to

Related Skills

View on GitHub
GitHub Stars711
CategoryDevelopment
Updated5d ago
Forks106

Languages

Python

Security Score

100/100

Audited on Mar 20, 2026

No findings