Akade.IndexedSet

A convenient data structure supporting efficient in-memory indexing and querying, including range queries and fuzzy string matching. In a nutshell, it allows you to write LINQ-like queries without enumerating through the entire list. If you are currently completely enumerating through your data, expect huge speedups and much better scalability!

Overview
- Design Goals
- Performance and Operation-Support of the different indices:
Features
FAQs
Roadmap

Overview

A sample showing different indices and queries:

var data = new Purchase[] {
    new(Id: 1, ProductId: 1, Amount: 1, UnitPrice: 5),
    new(Id: 2, ProductId: 1, Amount: 2, UnitPrice: 5),
    new(Id: 6, ProductId: 4, Amount: 3, UnitPrice: 12),
    new(Id: 7, ProductId: 4, Amount: 8, UnitPrice: 10) // discounted price
    };

IndexedSet<int, Purchase> set = data.ToIndexedSet(x => x.Id)
                                    .WithIndex(x => x.ProductId)
                                    .WithRangeIndex(x => x.Amount)
                                    .WithRangeIndex(x => x.UnitPrice)
                                    .WithRangeIndex(x => x.Amount * x.UnitPrice)
                                    .WithIndex(x => (x.ProductId, x.UnitPrice))
                                    .Build();

// efficient queries on configured indices
_ = set.Where(x => x.ProductId, 4);
_ = set.Range(x => x.Amount, 1, 3, inclusiveStart: true, inclusiveEnd: true);
_ = set.GreaterThanOrEqual(x => x.UnitPrice, 10);
_ = set.MaxBy(x => x.Amount * x.UnitPrice);
_ = set.Where(x => (x.ProductId, x.UnitPrice), (4, 10));

Design Goals

Much faster solution than (naive) LINQ-based full-enumeration
Syntax close to LINQ-Queries
Easy to use with a fluent builder API
Reflection & Expression-free to be AOT & Trimming friendly (for example for Blazor/WebASM)
It's not a db - in-memory only

Performance and Operation-Support of the different indices:

Below, you find runtime complexities. Benchmarks can be found here

General queries

n: total number of elements
m: number of elements in the return set
✔: Supported
⚠: Supported but throws if not exactly 1 item was found
❌: Not-supported

| Query | Unique-Index | NonUnique-Index | Range-Index | | --------- | ------------ | --------------- | --------------- | | Single | ⚠ O(1) | ⚠ O(1) | ⚠ O(log n) | | Where | ✔ O(1) | ✔ O(m) | ✔ O(log n + m) | | Range | ❌ | ❌ | ✔ O(log n + m) | | < / <= | ❌ | ❌ | ✔ O(log n + m) | | > / >= | ❌ | ❌ | ✔ O(log n + m) | | OrderBy | ❌ | ❌ | ✔ O(m) | | Max/Min | ❌ | ❌ | ✔ O(1) |

String queries

w: length of query word
D: maximum distance in fuzzy query
r: number of items in result set

| Query | Prefix-Index | FullText-Index | | ----------------| ------------ | -------------- | | StartWith | ✔ O(w+r) | ✔ O(w+r) | | Contains | ❌ | ✔ O(w+r) | | Fuzzy StartWith | ✔ O(w+D+r) | ✔ O(w+D+r) | | Fuzzy Contains | ❌ | ✔ O(w+D+r) |

ℹ FullText indices use a lot more memory than prefix indices and are more expensive to construct. Only use FullText indices if you really require it.

Spatial queries

The spatial index is based on an R*Tree. Runtime complexity for search of kNN is ~O(max height of tree), if we assume a low degree of overlap of the bounding boxes. Worst case in theory is O(n), but in practice the performance is usually pretty decent.

ℹ The performance of spatial queries depends heavily on the distribution of the data. Consider using bulk-loading (i.e. specify the data at indexedset creation) for best performance.

Vector queries

Vector indices use approximate nearest neighbor (ANN) search based on a Fresh Vamana graph. This provides very fast similarity search for high-dimensional vectors, commonly used for similarity search or semantic search with embeddings. The runtime complexity for Fresh Vamana-ANN depends on the dataset, construction etc, but in practice is sub-linear and scales well to large datasets.

Features

Unique index

Dictionary-based, O(1), access on keys:

IndexedSet<int, Data> set = IndexedSetBuilder<Data>.Create(a => a.PrimaryKey)
                                                   .WithUniqueIndex(x => x.SecondaryKey)
                                                   .Build();

_ = set.Add(new(PrimaryKey: 1, SecondaryKey: 5));

// fast access via primary key
Data data = set[1];

// fast access via secondary key
data = set.Single(x => x.SecondaryKey, 5);

ℹ Entities do not require a primary key. IndexedSet<TPrimaryKey, TData> inherits from IndexedSet<TData> but provides convenient access to the automatically added unique index: set[primaryKey] instead of set.Single(x => x.PrimaryKey, primaryKey).

Non-unique index

Dictionary-based, O(1), access on keys (single value) with multiple values (multiple keys):

IndexedSet<int, Data>? set = new Data[] { new(PrimaryKey: 1, SecondaryKey: 5), new(PrimaryKey: 2, SecondaryKey: 5) }
    .ToIndexedSet(x => x.PrimaryKey)
    .WithIndex(x => x.SecondaryKey)
    .Build();

// fast access via secondary key
IEnumerable<Data> data = set.Where(x => x.SecondaryKey, 5);

Range index

Binary-heap based O(log(n)) access for range based, smaller than (or equals) or bigger than (or equals) and orderby queries. Also useful to do paging sorted on exactly one index.

IndexedSet<Data> set = IndexedSetBuilder.Create(new Data[] { new(1, SecondaryKey: 3), new(2, SecondaryKey: 4) })
                                        .WithRangeIndex(x => x.SecondaryKey)
                                        .Build();

// fast access via range query
IEnumerable<Data> data = set.Range(x => x.SecondaryKey, 1, 5);

// fast max & min key value or elements
int maxKey = set.Max(x => x.SecondaryKey);
data = set.MaxBy(x => x.SecondaryKey);

// fast larger or smaller than
data = set.LessThan(x => x.SecondaryKey, 4);

// fast ordering & paging
data = set.OrderBy(x => x.SecondaryKey, skip: 10).Take(10); // second page of 10 elements

String indices and fuzzy matching

Prefix- & Suffix-Trie based indices for efficient StartWith & String-Contains queries including support for fuzzy matching.

IndexedSet<Type> data = typeof(object).Assembly.GetTypes()
                                               .ToIndexedSet()
                                               .WithPrefixIndex(x => x.Name)
                                               .WithFullTextIndex(x => x.FullName!)
                                               .Build();

// fast prefix or contains queries via indices
_ = data.StartsWith(x => x.Name, "Int");
_ = data.Contains(x => x.FullName!, "Int");

// fuzzy searching is supported by prefix and full text indices
// the following will also match "String"
_ = data.FuzzyStartsWith(x => x.Name, "Strang", 1);
_ = data.FuzzyContains(x => x.FullName!, "Strang", 1);

Akade.IndexedSet

Install / Use

README

Akade.IndexedSet

Overview

Design Goals

Performance and Operation-Support of the different indices:

General queries

String queries

Spatial queries

Vector queries

Features

Unique index

Non-unique index

Range index

String indices and fuzzy matching

Multi-key indices: All indices can be used wit