Akade.IndexedSet
A convenient data structure supporting efficient in-memory indexing and querying, including range queries and fuzzy string matching.
Install / Use
/learn @akade/Akade.IndexedSetREADME
Akade.IndexedSet
A convenient data structure supporting efficient in-memory indexing and querying, including range queries and fuzzy string matching. In a nutshell, it allows you to write LINQ-like queries without enumerating through the entire list. If you are currently completely enumerating through your data, expect huge speedups and much better scalability!
<!--TOC--> <!--/TOC-->Overview
A sample showing different indices and queries:
<!-- begin-snippet: Akade.IndexedSet.Tests/Samples/Readme.cs Overview(importer:cs?body-only=true) -->var data = new Purchase[] {
new(Id: 1, ProductId: 1, Amount: 1, UnitPrice: 5),
new(Id: 2, ProductId: 1, Amount: 2, UnitPrice: 5),
new(Id: 6, ProductId: 4, Amount: 3, UnitPrice: 12),
new(Id: 7, ProductId: 4, Amount: 8, UnitPrice: 10) // discounted price
};
IndexedSet<int, Purchase> set = data.ToIndexedSet(x => x.Id)
.WithIndex(x => x.ProductId)
.WithRangeIndex(x => x.Amount)
.WithRangeIndex(x => x.UnitPrice)
.WithRangeIndex(x => x.Amount * x.UnitPrice)
.WithIndex(x => (x.ProductId, x.UnitPrice))
.Build();
// efficient queries on configured indices
_ = set.Where(x => x.ProductId, 4);
_ = set.Range(x => x.Amount, 1, 3, inclusiveStart: true, inclusiveEnd: true);
_ = set.GreaterThanOrEqual(x => x.UnitPrice, 10);
_ = set.MaxBy(x => x.Amount * x.UnitPrice);
_ = set.Where(x => (x.ProductId, x.UnitPrice), (4, 10));
<!-- end-snippet -->
Design Goals
- Much faster solution than (naive) LINQ-based full-enumeration
- Syntax close to LINQ-Queries
- Easy to use with a fluent builder API
- Reflection & Expression-free to be AOT & Trimming friendly (for example for Blazor/WebASM)
- It's not a db - in-memory only
Performance and Operation-Support of the different indices:
Below, you find runtime complexities. Benchmarks can be found here
General queries
- n: total number of elements
- m: number of elements in the return set
- ✔: Supported
- ⚠: Supported but throws if not exactly 1 item was found
- ❌: Not-supported
| Query | Unique-Index | NonUnique-Index | Range-Index | | --------- | ------------ | --------------- | --------------- | | Single | ⚠ O(1) | ⚠ O(1) | ⚠ O(log n) | | Where | ✔ O(1) | ✔ O(m) | ✔ O(log n + m) | | Range | ❌ | ❌ | ✔ O(log n + m) | | < / <= | ❌ | ❌ | ✔ O(log n + m) | | > / >= | ❌ | ❌ | ✔ O(log n + m) | | OrderBy | ❌ | ❌ | ✔ O(m) | | Max/Min | ❌ | ❌ | ✔ O(1) |
String queries
- w: length of query word
- D: maximum distance in fuzzy query
- r: number of items in result set
| Query | Prefix-Index | FullText-Index | | ----------------| ------------ | -------------- | | StartWith | ✔ O(w+r) | ✔ O(w+r) | | Contains | ❌ | ✔ O(w+r) | | Fuzzy StartWith | ✔ O(w+D+r) | ✔ O(w+D+r) | | Fuzzy Contains | ❌ | ✔ O(w+D+r) |
ℹ FullText indices use a lot more memory than prefix indices and are more expensive to construct. Only use FullText indices if you really require it.
Spatial queries
The spatial index is based on an R*Tree. Runtime complexity for search of kNN is ~O(max height of tree), if we assume a low degree of overlap of the bounding boxes. Worst case in theory is O(n), but in practice the performance is usually pretty decent.
ℹ The performance of spatial queries depends heavily on the distribution of the data. Consider using bulk-loading (i.e. specify the data at indexedset creation) for best performance.
Vector queries
Vector indices use approximate nearest neighbor (ANN) search based on a Fresh Vamana graph. This provides very fast similarity search for high-dimensional vectors, commonly used for similarity search or semantic search with embeddings. The runtime complexity for Fresh Vamana-ANN depends on the dataset, construction etc, but in practice is sub-linear and scales well to large datasets.
Features
Unique index
Dictionary-based, O(1), access on keys:
<!-- begin-snippet: Akade.IndexedSet.Tests/Samples/Readme.cs Features_UniqueIndex(importer:cs?body-only=true) -->IndexedSet<int, Data> set = IndexedSetBuilder<Data>.Create(a => a.PrimaryKey)
.WithUniqueIndex(x => x.SecondaryKey)
.Build();
_ = set.Add(new(PrimaryKey: 1, SecondaryKey: 5));
// fast access via primary key
Data data = set[1];
// fast access via secondary key
data = set.Single(x => x.SecondaryKey, 5);
<!-- end-snippet -->
ℹ Entities do not require a primary key.
IndexedSet<TPrimaryKey, TData>inherits fromIndexedSet<TData>but provides convenient access to the automatically added unique index:set[primaryKey]instead ofset.Single(x => x.PrimaryKey, primaryKey).
Non-unique index
Dictionary-based, O(1), access on keys (single value) with multiple values (multiple keys):
<!-- begin-snippet: Akade.IndexedSet.Tests/Samples/Readme.cs Features_NonUniqueIndex_SingleKey(importer:cs?body-only=true) -->IndexedSet<int, Data>? set = new Data[] { new(PrimaryKey: 1, SecondaryKey: 5), new(PrimaryKey: 2, SecondaryKey: 5) }
.ToIndexedSet(x => x.PrimaryKey)
.WithIndex(x => x.SecondaryKey)
.Build();
// fast access via secondary key
IEnumerable<Data> data = set.Where(x => x.SecondaryKey, 5);
<!-- end-snippet -->
Range index
Binary-heap based O(log(n)) access for range based, smaller than (or equals) or bigger than (or equals) and orderby queries. Also useful to do paging sorted on exactly one index.
<!-- begin-snippet: Akade.IndexedSet.Tests/Samples/Readme.cs Features_RangeIndex(importer:cs?body-only=true) -->IndexedSet<Data> set = IndexedSetBuilder.Create(new Data[] { new(1, SecondaryKey: 3), new(2, SecondaryKey: 4) })
.WithRangeIndex(x => x.SecondaryKey)
.Build();
// fast access via range query
IEnumerable<Data> data = set.Range(x => x.SecondaryKey, 1, 5);
// fast max & min key value or elements
int maxKey = set.Max(x => x.SecondaryKey);
data = set.MaxBy(x => x.SecondaryKey);
// fast larger or smaller than
data = set.LessThan(x => x.SecondaryKey, 4);
// fast ordering & paging
data = set.OrderBy(x => x.SecondaryKey, skip: 10).Take(10); // second page of 10 elements
<!-- end-snippet -->
String indices and fuzzy matching
Prefix- & Suffix-Trie based indices for efficient StartWith & String-Contains queries including support for fuzzy matching.
<!-- begin-snippet: Akade.IndexedSet.Tests/Samples/Readme.cs Features_StringQueries(importer:cs?body-only=true) -->IndexedSet<Type> data = typeof(object).Assembly.GetTypes()
.ToIndexedSet()
.WithPrefixIndex(x => x.Name)
.WithFullTextIndex(x => x.FullName!)
.Build();
// fast prefix or contains queries via indices
_ = data.StartsWith(x => x.Name, "Int");
_ = data.Contains(x => x.FullName!, "Int");
// fuzzy searching is supported by prefix and full text indices
// the following will also match "String"
_ = data.FuzzyStartsWith(x => x.Name, "Strang", 1);
_ = data.FuzzyContains(x => x.FullName!, "Strang", 1);
<!-- end-snippet -->
