Spellcorrect
A program to correct non-word spelling error in sentences using ngram MAP Language Models, Noisy Channel Model, Error Confusion Matrix and Damerau-Levenshtein Edit Distance.
Install / Use
/learn @jbhoosreddy/SpellcorrectREADME
spellcorrect
This package implements a 1990 paper A Spelling Correction Program Based on a Noisy Channel Model by Mark D. Kernighan, Kenneth Ward Church and William A. Gale. A copy of this paper can be found online here or through a simple search.
A package to correct non-word spelling error in sentences using ngram MAP Language Models, Noisy Channel Model, Error Confusion Matrix and Damerau-Levenshtein Edit Distance.
I have encoded the datasets in a computer readable-form. Feel free to utilize the dataset and the ngram helper library (included) to improve this library or make your own. Or you can play with the hyper-parameters to improve your results. You can ping me on #IRC and Twitter at @jbhoosreddy if you have any questions.
class SpellCorrect
| A program to correct non-word spelling error in sentences using ngram MAP Language Models, Noisy Channel Model, Error Confusion Matrix and Damerau-Levenshtein Edit Distance.
| Usage:
| Input: 'she is a briliant acress'
| Response: she is a brilliant actress
|
| Methods defined here:
|
| __init__(self)
| Constructor method to load external nGram class, load words, confusion matrix and dictionary.
|
| channelModel(self, x, y, edit)
| Method to calculate channel model probability for errors.
|
| dlEditDistance(self, s1, s2)
| Method to calculate Damerau-Levenshtein Edit Distance for two strings.
|
| editType(self, candidate, word)
| Method to calculate edit type for single edit errors.
|
| genCandidates(self, word)
| Method to generate set of candidates for a given word using Damerau-Levenshtein Edit Distance.
|
| loadConfusionMatrix(self)
| Method to load Confusion Matrix from external data file.
|
| loadDict(self)
| Method to load dictionary from external data file.
Confusion matrices
As answered by @lzw429 in #1, the authors have provided in the appendix the computed deletion, addition, substitution and reversal matrices. I have manually transcribed this data to .data files which are processed in memory. The paper has more information on how this data was originally created by the authors.
Related Skills
node-connect
335.2kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
82.5kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
335.2kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
commit-push-pr
82.5kCommit, push, and open a PR
