BloomFilter.NetCore
A bloom filter implementation
Install / Use
/learn @vla/BloomFilter.NetCoreREADME
BloomFilter.NetCore
A high-performance, feature-complete Bloom filter library for .NET, supporting both in-memory and distributed Redis backends.
Table of Contents
- Overview
- Key Features
- Packages & Status
- Architecture
- Core Functionality
- Installation
- Quick Start
- Usage Examples
- Hash Algorithms
- Performance Benchmarks
- Advanced Usage
- API Reference
- Contributing
- License
Overview
BloomFilter.NetCore is an enterprise-grade Bloom filter library designed for the .NET ecosystem. A Bloom filter is a space-efficient probabilistic data structure used to test whether an element is a member of a set. Its core characteristics are:
- Space Efficient: Extremely small memory footprint compared to traditional HashSets
- O(1) Time Complexity: Both add and query operations execute in constant time
- Probabilistic: May return false positives but never false negatives
This project provides two major implementation types:
- In-Memory Bloom Filter (FilterMemory): BitArray-based in-memory implementation, suitable for single-process scenarios
- Distributed Bloom Filter (FilterRedis series): Redis-backed distributed implementation, supports concurrent access from multiple applications
Primary Use Cases
- Cache Penetration Protection: Prevent malicious queries for non-existent data from bypassing cache
- Deduplication: URL deduplication, email deduplication, user ID deduplication, etc.
- Recommendation Systems: Check if a user has seen specific content
- Web Crawlers: Check if URLs have been crawled
- Distributed Systems: Share state checks across multiple service instances
- Big Data: Existence checks for massive datasets
Key Features
🎯 Flexible Configuration
- Fully Configurable Parameters: Bit array size (m), number of hash functions (k)
- Automatic Parameter Calculation: Automatically calculate optimal parameters based on tolerable false positive rate (p) and expected element count (n)
- 20+ Hash Algorithms: Support for CRC, MD5, SHA, Murmur, LCGs, xxHash, or custom algorithms
⚡ High Performance
- Fast Generation: Bloom filter generation and operations are extremely fast
- Optimized Implementation: Uses Span<T>, ReadOnlyMemory<T> for zero-copy operations
- Unsafe Code Optimization: Uses unsafe code blocks in performance-critical paths
- Rejection Sampling: Implements rejection sampling and hash chaining, considering avalanche effect for improved hash quality
🔒 Concurrency Safe
- Thread-Safe: Uses AsyncLock mechanism for safe multi-threaded concurrent access
- Async Support: Comprehensive async/await support with async versions of all operations
- Distributed Locking: Redis implementations support concurrent access across applications
🌐 Multiple Backend Support
- StackExchange.Redis: Officially recommended Redis client
- CSRedisCore: High-performance Redis client
- FreeRedis: Lightweight Redis client
- EasyCaching: Supports EasyCaching abstraction layer, switchable cache providers
📦 Modern .NET Support
- Multi-Framework Support: net462, netstandard2.0, net6.0, net7.0, net8.0, net9.0, net10.0
- Dependency Injection: Native support for Microsoft.Extensions.DependencyInjection
- Nullable Reference Types: Enabled for improved code safety
Packages & Status
| Package | NuGet | Description |
|---------|-------|-------------|
|BloomFilter.NetCore|| Core package with in-memory Bloom filter |
|BloomFilter.Redis.NetCore|
| StackExchange.Redis implementation |
|BloomFilter.CSRedis.NetCore|
| CSRedisCore implementation |
|BloomFilter.FreeRedis.NetCore|
| FreeRedis implementation |
|BloomFilter.EasyCaching.NetCore|
| EasyCaching integration |
Architecture
Core Interface Layer
IBloomFilter (Interface)
├── Add / AddAsync - Add elements
├── Contains / ContainsAsync - Check elements
├── All / AllAsync - Batch check
├── Clear / ClearAsync - Clear filter
└── ComputeHash - Compute hash values
Implementation Hierarchy
Filter (Abstract Base Class)
├── FilterMemory (In-Memory)
│ └── Uses BitArray storage
│
└── Redis Series (Distributed)
├── FilterRedis (StackExchange.Redis)
├── FilterCSRedis (CSRedisCore)
├── FilterFreeRedis (FreeRedis)
└── FilterEasyCachingRedis (EasyCaching)
Configuration System
BloomFilterOptions
├── FilterMemoryOptions - In-memory mode configuration
├── FilterRedisOptions - StackExchange.Redis configuration
├── FilterCSRedisOptions - CSRedisCore configuration
├── FilterFreeRedisOptions - FreeRedis configuration
└── FilterEasyCachingOptions - EasyCaching configuration
Core Functionality
Mathematical Model
BloomFilter.NetCore implements the complete Bloom filter mathematical model:
1. Optimal Bit Array Size (m)
Given expected element count n and false positive rate p, calculate optimal bit array size:
m = -(n * ln(p)) / (ln(2)^2)
2. Optimal Number of Hash Functions (k)
Given element count n and bit array size m, calculate optimal number of hash functions:
k = (m / n) * ln(2)
3. Actual False Positive Rate (p)
Given inserted element count, number of hash functions, and bit array size, calculate actual false positive rate:
p = (1 - e^(-k*n/m))^k
These calculations are provided by static methods in the Filter base class:
// Calculate optimal bit array size
long m = Filter.BestM(expectedElements, errorRate);
// Calculate optimal number of hash functions
int k = Filter.BestK(expectedElements, capacity);
// Calculate optimal element count
long n = Filter.BestN(hashes, capacity);
// Calculate actual false positive rate
double p = Filter.BestP(hashes, capacity, insertedElements);
Storage Mechanisms
In-Memory Storage
- BitArray: Uses .NET's BitArray as underlying storage
- Bucketing Strategy: Automatically splits into multiple BitArrays when capacity exceeds 2GB (MaxInt = 2,147,483,640)
- Serialization Support: Supports serialization/deserialization for persistence or transfer
Redis Storage
- SETBIT/GETBIT: Uses Redis bit operation commands
- Distributed Access: Multiple application instances can concurrently access the same filter
- Persistence: Leverages Redis persistence mechanisms for data safety
Concurrency Control
// AsyncLock ensures thread safety
public class AsyncLock
{
private readonly SemaphoreSlim _semaphore = new(1, 1);
public async ValueTask<IDisposable> LockAsync()
{
await _semaphore.WaitAsync();
return new Release(_semaphore);
}
}
Installation
Install via NuGet
In-Memory Mode (Core Package):
dotnet add package BloomFilter.NetCore
Redis Distributed Mode (Choose One):
# StackExchange.Redis
dotnet add package BloomFilter.Redis.NetCore
# CSRedisCore
dotnet add package BloomFilter.CSRedis.NetCore
# FreeRedis
dotnet add package BloomFilter.FreeRedis.NetCore
# EasyCaching
dotnet add package BloomFilter.EasyCaching.NetCore
Quick Start
Simplest Example
using BloomFilter;
// Create a Bloom filter: expect 10 million elements, 1% false positive rate
var bf = FilterBuilder.Build(10_000_000, 0.01);
// Add elements
bf.Add("user:123");
bf.Add("user:456");
// Check element existence
Console.WriteLine(bf.Contains("user:123")); // True
Console.WriteLine(bf.Contains("user:789")); // False (very small probability of True)
// Clear filter
bf.Clear();
Async Operations
// Async add
await bf.AddAsync(Encoding.UTF8.GetBytes("user:123"));
// Async check
bool exists = await bf.ContainsAsync(Encoding.UTF8.GetBytes("user:123"));
// Batch async operations
var users = new[] {
Encoding.UTF8.GetBytes("user:1"),
Encoding.UTF8.GetBytes("user:2"),
Encoding.UTF8.GetBytes("user:3")
};
await bf.AddAsync(users);
var results = await bf.ContainsAsync(users);
Fluent API (New in v3.0)
v3.0 introduces a modern fluent API for building Bloom filters with improved discoverability and expressiveness:
// In-Memory Fluent API
var filter = FilterBuilder.Create()
.WithName("UserFilter")
.ExpectingElements(10_000_000)
.WithErrorRate(0.001)
.UsingHashMethod(HashMethod.XXHash3)
.BuildInMemory();
// Redis Fluent API (StackExchange.Redis)
var redisFilter = FilterRedisBuilder.Cre
