Easy to use and powerful fuzzy string matching.

Mostly a JavaScript port of the TheFuzz (formerly fuzzywuzzy) Python library, with some additional features like token similarity sorting and wildcard support.

Demo <a href="https://nol13.github.io/fuzzball.js" target="_blank">here</a> comparing some of the different scorers/options. Auto-generated API Docs <a href="https://github.com/nol13/fuzzball.js/blob/master/jsdocs/fuzzball.md" target="_blank">here</a>.

Installation
Basic Usage
Functions
Pre-Processing
Collation and Unicode Stuff
Batch Extract
Multiple Fields
Async and Cancellation
Wildcards
Fuzzy Dedupe
Performance Optimization
Alternate Ratio Calculations
Lite Versions
Credits (aka, projects I stole code from)
Contributions

Installation

Using NPM

npm install fuzzball

Browser (using pre-built umd bundle, make sure script is utf-8 if page isn't already)

<script charset="UTF-8" src="dist/fuzzball.umd.min.js"></script>
<script>
fuzzball.ratio("fuzz", "fuzzy");
</script>

or as module

<script charset="UTF-8" type="module"">
import {ratio} from './dist/esm/fuzzball.esm.min.js';
console.log(ratio('fuzz', 'fuzzy'));
</script>

See the lite section below if you need the smallest possible file size. If you need to support IE or node < v14 use v2.1.6 or earlier.

Basic Usage

fuzz = require('fuzzball');

fuzz.ratio("hello world", "hiyyo wyrld");
        64
fuzz.token_set_ratio("fuzzy was a bear", "a fuzzy bear fuzzy was");
        100

options = {scorer: fuzz.token_set_ratio};
choices = ["Hood, Harry", "Mr. Minor", "Mr. Henry Hood"];

fuzz.extract("mr. harry hood", choices, options);

// [choice, score, index/key]
[ [ 'Hood, Harry', 100, 0 ],
  [ 'Mr. Henry Hood', 85, 2 ],
  [ 'Mr. Minor', 40, 1 ] ]

/**
* Set options.returnObjects = true to get back
* an array of {choice, score, key} objects instead of tuples
*/

results = await fuzz.extractAsPromised("mr. harry hood", choices, options);

// Cancel search

const abortController = new AbortController();
options.abortController = abortController;

fuzz.extractAsPromised("gonna get canceled", choices, options)
        .then(res => {/* do stuff */})
        .catch((e) => {
                if (e.message === 'aborted') console.log('Search was aborted!')
        });

abortController.abort();

Functions

Simple Ratio

// "!" Stripped and lowercased in pre-processing by default
fuzz.ratio("this is a test", "This is a test!");
        100

Partial Ratio

Highest scoring substring of the longer string vs. the shorter string.

// Still 100, substring of 2nd is a perfect match of the first
fuzz.partial_ratio("test", "testing");
        100

Token Sort Ratio

Tokenized, sorted, and then recombined before scoring.

fuzz.ratio("fuzzy wuzzy was a bear", "wuzzy fuzzy was a bear");
        91
fuzz.token_sort_ratio("fuzzy wuzzy was a bear", "wuzzy fuzzy was a bear");
        100

Token Set Ratio

Highest of 3 scores comparing the set intersection, intersection + difference 1 to 2, and intersection + difference 2 to 1.

fuzz.token_sort_ratio("fuzzy was a bear", "fuzzy fuzzy was a bear");
        84
fuzz.token_set_ratio("fuzzy was a bear", "fuzzy fuzzy was a bear");
        100

If you set options.trySimple to true it will add the simple ratio to the token_set_ratio test suite as well. This can help smooth out occational irregularities in how much differences in the first letter of a token will get penalized.

Token Similarity Sort Ratio

Instead of sorting alphabetically, tokens will be sorted by similarity to the smaller set. Useful if the matching token may have a different first letter, but performs a bit slower. You can also use similarity sorting when calculating token_set_ratio by setting sortBySimilarity to true.

Not available in the lite builds and sorting doesn't yet take wildcards or collation into account. Based off this fuzzywuzzy PR by Exquisition. (https://github.com/seatgeek/fuzzywuzzy/pull/296)

fuzz.token_sort_ratio('apple cup zebrah horse foo', 'zapple cub horse bebrah bar')
        58
fuzz.token_set_ratio('apple cup zebrah horse foo', 'zapple cub horse bebrah bar')
        61
fuzz.token_similarity_sort_ratio('apple cup zebrah horse foo', 'zapple cub horse bebrah bar')
        68
fuzz.token_set_ratio('apple cup zebrah horse foo', 'zapple cub horse bebrah bar', {sortBySimilarity: true})
        71

Distance

Unmodified Levenshtein distance without any additional ratio calculations.

fuzz.distance("fuzzy was a bear", "fozzy was a bear");
        1

Other Scoring Options

partial_token_set_ratio (options.trySimple = true will add the partial_ratio to the test suite, note this function will always return 100 if there are any tokens in common)
partial_token_sort_ratio
partial_token_similarity_sort_ratio
WRatio (runs tests based on relative string length and returns weighted top score, current default scorer in fuzzywuzzy extract)

Blog post with overview of scoring algorithms can be found here.

Options

Pre-Processing

Pre-processing to remove non-alphanumeric characters run by default unless options.full_process is set to false.

// Eh, don't need to clean it up..
// Set options.force_ascii to true to remove all non-ascii letters as well, default: false
fuzz.ratio("this is a test", "this is a test!", {full_process: false});
        97

Or run separately.. (run beforehand to avoid a bit of performance overhead)

// force_ascii will strip out non-ascii characters except designated wildcards
fuzz.full_process("myt^eäXt!");
        myt eäxt
fuzz.full_process("myt^eäXt!", {force_ascii: true});
        myt ext

Consecutive white space will be collapsed unless options.collapseWhitespace = false, default true. Setting to false will match the behavior in fuzzywuzzy. Only affects the non-token scorers.

Collation and Unicode Stuff

To use collation when calculating edit distance, set useCollator to true.

Setting useCollator to true will have an impact on performance, so if you have really large number of choices may be best to pre-process (i.e. lodash _.deburr) instead if possible.

options = {useCollator: true};
fuzz.ratio("this is ä test", "this is a test", options);
        100

If your strings contain code points beyond the basic multilingual plane (BMP), set astral to true. If your strings contain astral symbols and this is not set, those symbols will be treated as multiple characters and the ratio will be off a bit. (This will have some impact on performance, which is why it is turned off by default.)

options = {astral: true};
fuzz.ratio("ab🐴c", "ab🐴d", options);
        75

When astral is true it will also normalize your strings before scoring. You can set the normalize option to false if you want different representations not to match, but is true by default.

Batch Extract

Search list of choices for top results.

fuzz.extract(query, choices, options);

fuzz.extractAsync(query, choices, options, function(err, results) { /* do stuff */ }); (internal loop will be non-blocking)

fuzz.extractAsPromised(query, choices, options).then(results => { /* do stuff */ }); (Promise will not be polyfilled)

Simple: array of strings, or object in form of {key: "string"}

The scorer defaults to fuzz.ratio if not specified.

With array of strings

query = "polar bear";
choices = ["brown bear", "polar bear", "koala bear"];

results = fuzz.extract(query, choices);

// [choice, score, index]
[ [ 'polar bear', 100, 1 ],
  [ 'koala bear', 80, 2 ],
  [ 'brown bear', 60, 0 ] ]

With object

query = "polar bear";
choicesObj = {id1: "brown bear",
              id2: "polar bear",
              id3: "koala bear"};

results = fuzz.extract(query, choicesObj);

// [choice, score, key]
[ [ 'polar bear', 100, 'id2' ],
  [ 'koala bear', 80, 'id3' ],
  [ 'brown bear', 60, 'id1' ] ]

Return objects

options = {returnObjects: true}
results = fuzz.extract(query, choicesObj, options);

[ { choice: 'polar bear', score: 100, key: 'id2' },
  { choice: 'koala bear', score: 80, key: 'id3' },
  { choice: 'brown bear', score: 60, key: 'id1' } ]

Less simple: array of objects, or object in form of {key: choice}, with processor function + options

Optional processor function takes a choice and returns the string which will be used for scoring. Each choice can be a string or an object, as long as the processor function can accept it and return a string.

query = "126abzx";
choices = [{id: 345, model: "123abc"},
           {id: 346, model: "123efg"},
           {id: 347, model: "456abdzx"}];

options = {
        scorer: fuzz.partial_ratio, // Any function that takes two values and returns a score, default: ratio
        processor: choice => choice.model,  // Takes choice object, returns string, default: no processor. Must supply if choices are not already strings.
        limit: 2, // Max number of top results to return, default: no limit / 0.
        cutoff: 50, // Lowest score to return, default: 0
        unsorted: false // Results won't be sorted if true, default: false. If true limit will be ignored.
};

results = fuzz.extract(query, choices, options);

// [choice, score, index/key]
[ [ { id: 347,

Fuzzball.js

Install / Use

README

Contents