FuzzySharp
C# .NET bit-parallel accelerated fuzzy string matching implementation of Seat Geek's well known python FuzzyWuzzy algorithm.
Install / Use
/learn @Raffinert/FuzzySharpREADME
Terms of use<sup>?</sup>
By using this project or its source code, for any purpose and in any shape or form, you grant your implicit agreement to all the following statements:
- You condemn Russia and its military aggression against Ukraine
- You recognize that Russia is an occupant that unlawfully invaded a sovereign state
- You support Ukraine's territorial integrity, including its claims over temporarily occupied territories of Crimea and Donbas
- You reject false narratives perpetuated by Russian state propaganda
To learn more about the war and how you can help, click here. Glory to Ukraine! 🇺🇦
Raffinert.FuzzySharp
C# .NET fast fuzzy string matching implementation of Seat Geek's well known python FuzzyWuzzy algorithm.
~~Nitrous-boosted~~ Bit-parallel accelerated version of the original FuzzySharp with multiple bugs fixed in the partial_ratio implementation.
Benchmark comparison of naive DP Levenshtein distance calculation (baseline), FuzzySharp, Fastenshtein and Quickenshtein:
Random words of 3 to 1024 random chars (LevenshteinLarge.cs):
| Method | Mean | Error | StdDev | Ratio | RatioSD | Gen0 | Gen1 | Allocated | Alloc Ratio | |-----------------------------------------------------------------|------------|------------|-----------|-------|---------|------------|------------|-------------|-------------| | NaiveDp | 231.563 ms | 57.5403 ms | 3.1540 ms | 1.00 | 0.02 | 43500.0000 | 34500.0000 | 275312920 B | 1.000 | | FuzzySharp | 141.820 ms | 4.0905 ms | 0.2242 ms | 0.61 | 0.01 | - | - | 1545732 B | 0.006 | | Fastenshtein | 123.356 ms | 13.0959 ms | 0.7178 ms | 0.53 | 0.01 | - | - | 34028 B | 0.000 | | Quickenshtein | 12.918 ms | 12.8046 ms | 0.7019 ms | 0.06 | 0.00 | - | - | 12 B | 0.000 | | Raffinert.FuzzySharp | 4.970 ms | 0.3311 ms | 0.0181 ms | 0.02 | 0.00 | - | - | 3051 B | 0.000 |
Installation
Install-Package Raffinert.FuzzySharp
or
dotnet add package Raffinert.FuzzySharp
Usage
Simple Ratios
<p align="right"><a href="https://dotnetfiddle.net/9JpFTQ">Run .NET fiddle</a></p>Fuzz.Ratio("mysmilarstring", "myawfullysimilarstirng");
// 72
Fuzz.Ratio("mysmilarstring", "mysimilarstring");
// 97
Partial Ratio
<p align="right"><a href="https://dotnetfiddle.net/rk0dIO">Run .NET fiddle</a></p>Fuzz.PartialRatio("similar", "somewhresimlrbetweenthisstring");
// 71
Token Sort Ratio
<p align="right"><a href="https://dotnetfiddle.net/b5RVp2">Run .NET fiddle</a></p>Fuzz.TokenSortRatio("order words out of", " words out of order");
// 100
Fuzz.PartialTokenSortRatio("order words out of", " words out of order");
// 100
Token Set Ratio
<p align="right"><a href="https://dotnetfiddle.net/ZfZRGb">Run .NET fiddle</a></p>Fuzz.TokenSetRatio("fuzzy was a bear", "fuzzy fuzzy fuzzy bear");
// 100
Fuzz.PartialTokenSetRatio("fuzzy was a bear", "fuzzy fuzzy fuzzy bear");
// 100
Token Initialism Ratio
<p align="right"><a href="https://dotnetfiddle.net/87181A">Run .NET fiddle</a></p>Fuzz.TokenInitialismRatio("NASA", "National Aeronautics and Space Administration");
// 89
Fuzz.TokenInitialismRatio("NASA", "National Aeronautics Space Administration");
// 100
Fuzz.TokenInitialismRatio("NASA", "National Aeronautics Space Administration, Kennedy Space Center, Cape Canaveral, Florida 32899");
// 53
Fuzz.PartialTokenInitialismRatio("NASA", "National Aeronautics Space Administration, Kennedy Space Center, Cape Canaveral, Florida 32899");
// 100
Token Abbreviation Ratio
<p align="right"><a href="https://dotnetfiddle.net/MVlwrW">Run .NET fiddle</a></p>Fuzz.TokenAbbreviationRatio("bl 420", "Baseline section 420", PreprocessMode.Full);
// 40
Fuzz.PartialTokenAbbreviationRatio("bl 420", "Baseline section 420", PreprocessMode.Full);
// 67
Weighted Ratio
<p align="right"><a href="https://dotnetfiddle.net/n9QxAk">Run .NET fiddle</a></p>Fuzz.WeightedRatio("The quick brown fox jimps ofver the small lazy dog", "the quick brown fox jumps over the small lazy dog");
// 95
Process Extraction
Find the best match(es) from a collection of choices.
<p align="right"><a href="https://dotnetfiddle.net/8lEzk3">Run .NET fiddle</a></p>Process.ExtractOne("cowboys", new[] { "Atlanta Falcons", "New York Jets", "New York Giants", "Dallas Cowboys" });
// (string: Dallas Cowboys, score: 90, index: 3)
Process.ExtractTop("goolge", new[] { "google", "bing", "facebook", "linkedin", "twitter", "googleplus", "bingnews", "plexoogl" }, limit: 3);
// [(string: google, score: 83, index: 0), (string: googleplus, score: 75, index: 5), (string: plexoogl, score: 43, index: 7)]
Process.ExtractAll("goolge", new[] { "google", "bing", "facebook", "linkedin", "twitter", "googleplus", "bingnews", "plexoogl" });
// [(string: google, score: 83, index: 0), (string: bing, score: 36, index: 1), ...]
// With score cutoff
Process.ExtractAll("goolge", new[] { "google", "bing", "facebook", "linkedin", "twitter", "googleplus", "bingnews", "plexoogl" }, cutoff: 40);
// [(string: google, score: 83, index: 0), (string: googleplus, score: 75, index: 5), (string: plexoogl, score: 43, index: 7)]
Process.ExtractSorted("goolge", new[] { "google", "bing", "facebook", "linkedin", "twitter", "googleplus", "bingnews", "plexoogl" });
// [(string: google, score: 83, index: 0), (string: googleplus, score: 75, index: 5), (string: plexoogl, score: 43, index: 7), ...]
Extraction uses WeightedRatio and Full preprocessing by default. Override these in the method parameters to use different scorers and processing:
Process.ExtractOne("cowboys", new[] { "Atlanta Falcons", "New York Jets", "New York Giants", "Dallas Cowboys" }, s => s, ScorerCache.Get<DefaultRatioScorer>());
// (string: Dallas Cowboys, score: 57, index: 3)
Generic Type Extraction
<p align="right"><a href="https://dotnetfiddle.net/YDtl6k">Run .NET fiddle</a></p>Extraction can operate on objects of any type. Use the processor parameter to reduce the object to the string it should be compared on:
var events = new[]
{
new[] { "chicago cubs vs new york mets", "CitiField", "2011-05-11", "8pm" },
new[] { "new york yankees vs boston red sox", "Fenway Park", "2011-05-11", "8pm" },
new[] { "atlanta braves vs pittsburgh pirates", "PNC Park", "2011-05-11", "8pm" },
};
var query = new[] { "new york mets vs chicago cubs", "CitiField", "2017-03-19", "8pm" };
var best = Process.ExtractOne(query, events, strings => strings[0]);
// (value: { "chicago cubs vs new york mets", "CitiField", "2011-05-11", "8pm" }, score: 95, index: 0)
Fluent Pipeline API
The Process.Configure() fluent builder creates reusable, immutable pipelines with preconfigured scoring, caching, and parallel execution.
Basic Pipeline
Equivalent to the static Process methods, but reusable across multiple queries:
var pipeline = Process.Configure().Build();
var result1 = pipeline.ExtractOne("cowboys", new[] { "Atlanta Falcons", "New York Jets", "New York Giants", "Dallas Cowboys" });
//(string: Dallas Cowboys, score: 90, index: 3)
var result2 = pipeline.ExtractOne(
"chicago cubs",
new[]
{
"Boston Red Sox",
"Los Angeles Dodgers",
"New York Yankees",
"San Francisco Giants",
"St. Louis Cardinals",
"Houston Astros"
});
//(string: San Francisco Giants, score: 45, index: 3)
Custom Scorer
<p align="right"><a href="https://dotnetfiddle.net/6JVmU9">Run .NET fiddle</a></p>var pipeline = Process.Configure()
.WithScorer(ScorerCache.Get<DefaultRatioScorer>())
.Build();
var result = pipeline.ExtractOne("cowboys", new[] { "Atlanta Falcons", "New York Jets", "New York Giants", "Dallas Cowboys" });
//(string: Dallas Cowboys, score: 67, index: 3)
Parallel Execution
Enable multi-threaded processing for large choice sets:
var pipeline = Process.Configure()
.Parallel()
.Build();
var results = pipeline.ExtractAll("goolge", largeChoicesList);
With ParallelOptions for fine-grained control:
var pipeline = Process.Configure()
.Parallel(new ParallelOptions { MaxDegreeOfParallelism = Environment.ProcessorCount })
.Build();
Cached Execution
<p align="right"><a href="https://dotnetfiddle.net/y6qMJm">Run .NET fiddle</a></p>Automatic caching creates a CachedWeightedRatioScorer per extraction call, pre-initializing internal data structures for the query string:
var pipeline = Process.Configure()
.Cached()
.Build();
var result = pipeline.ExtractOne("cowboys", new[] { "A
