PageBloomFilter
Fastest bloom filter in C++/Go/Rust/Java/C#
Install / Use
/learn @PeterRK/PageBloomFilterREADME
PageBloomFilter
Bloom filter with page, designed for storage density and query speed.
C++
auto bf = NEW_BLOOM_FILTER(500, 0.01);
if (bf.set("Hello")) {
std::cout << "set new Hello" << std::endl;
}
if (bf.test("Hello")) {
std::cout << "find Hello" << std::endl;
}
C++ implement runs extremely fast with aesni instruction. The standard compatible version is also good enough.
// U7-155H & Clang-18
pbf-set: 10.9247 ns/op
pbf-test: 6.0765 ns/op
pbf-aesni-set: 8.3275 ns/op
pbf-aesni-test: 4.3405 ns/op
libbf-set: 36.6518 ns/op
libbf-test: 31.7608 ns/op
libbloom-set: 33.0665 ns/op
libbloom-test: 13.5359 ns/op
Go
// import "github.com/PeterRK/PageBloomFilter/go"
// BloomFilter with 0.01 false positive rate for 500 items
bf := pbf.NewBloomFilter(500, 0.01)
if bf.Set("Hello") {
fmt.Println("set new Hello")
}
if bf.Test("Hello") {
fmt.Println("find Hello")
}
Function injection techique is avaliable for AMD64. It uses code compiled by clang in Go without CGO. A benchmark with 500k elements on a Xeon-8374C machine shows new implement runs much fast than the pure go implement.
name old time/op new time/op delta
Set4 53.6ns ± 6% 26.5ns ± 6% -50.52% (p=0.000 n=20+20)
Test4 40.5ns ± 5% 21.2ns ± 5% -47.63% (p=0.000 n=20+18)
Set5 56.4ns ± 5% 28.0ns ± 5% -50.34% (p=0.000 n=20+19)
Test5 41.5ns ± 3% 18.8ns ± 7% -54.72% (p=0.000 n=20+19)
Set6 57.6ns ± 5% 29.1ns ± 5% -49.44% (p=0.000 n=20+20)
Test6 42.2ns ± 4% 18.5ns ± 7% -56.22% (p=0.000 n=20+18)
Set7 58.8ns ± 4% 30.8ns ± 9% -47.68% (p=0.000 n=20+20)
Test7 43.9ns ± 6% 18.9ns ± 8% -56.98% (p=0.000 n=20+19)
Set8 58.4ns ± 9% 32.4ns ± 5% -44.53% (p=0.000 n=20+19)
Test8 44.8ns ± 2% 18.4ns ± 7% -58.86% (p=0.000 n=19+20)
We suggest that user should execute go-inject.sh to gnerate new injecting code before build. Clang, binutils and python are needed.
Benchmark shows it runs 2x time faster than other famous bloom filter implements, bits-and-blooms and Tyler Treat's:
// i7-10710U & Go-1.20
BenchmarkPageBloomFilterSet-6 1000000 32.70 ns/op
BenchmarkPageBloomFilterTest-6 1000000 20.23 ns/op
BenchmarkBitsAndBloomSet-6 1000000 120.5 ns/op
BenchmarkBitsAndBloomTest-6 1000000 81.46 ns/op
BenchmarkTylerTreatSet-6 1000000 98.30 ns/op
BenchmarkTylerTreatTest-6 1000000 60.69 ns/op
// U7-155H & Go-1.20
BenchmarkPageBloomFilterSet-16 1000000 13.95 ns/op
BenchmarkPageBloomFilterTest-16 1000000 8.40 ns/op
BenchmarkBitsAndBloomSet-16 1000000 44.57 ns/op
BenchmarkBitsAndBloomTest-16 1000000 37.94 ns/op
BenchmarkTylerTreatSet-16 1000000 43.94 ns/op
BenchmarkTylerTreatTest-16 1000000 20.80 ns/op
Java
PageBloomFilter bf = PageBloomFilter.New(500, 0.01);
byte[] hello = "Hello".getBytes("UTF-8");
if (bf.set(hello)) {
System.out.println("set new Hello");
}
if (bf.test(hello)) {
System.out.println("find Hello");
}
Benchmark shows it runs much faster than Google's Guava. On the newer U7-155H/OpenJDK-21 numbers below, it is also slightly faster than Alexandr Nikitin's bloom-filter-scala. It is still slower than the Go and C++ implementations.
// i7-10710U & OpenJDK-17
pbfSet 50.962 ns/op
pbfTest 40.465 ns/op
guavaSet 133.514 ns/op
guavaTest 112.318 ns/op
nikitinSet 86.931 ns/op
nikitinTest 62.133 ns/op
// U7-155H & OpenJDK-21
pbfSet 20.780 ns/op
pbfTest 16.831 ns/op
guavaSet 44.889 ns/op
guavaTest 45.652 ns/op
nikitinSet 22.474 ns/op
nikitinTest 18.489 ns/op
C#
var bf = PageBloomFilter.New(500, 0.01);
var hello = Encoding.ASCII.GetBytes("Hello")
if (bf.Set(hello)) {
Console.WriteLine("set new Hello");
}
if (bf.Test(hello)) {
Console.WriteLine("find Hello");
}
C# code is very similar to Java, but the numbers below are still slower than the Java version. It remains faster than BloomFilter.NetCore.
// i7-10710U & .NET-7
pbf-set: 83.461274 ns/op
pbf-test: 74.953785 ns/op
// U7-155H & .NET-9
pbf-set: 28.63103 ns/op
pbf-test: 22.88545 ns/op
bf.net-set: 41.66280 ns/op
bf.net-test: 40.12608 ns/op
Python
bf = pbf.create(500, 0.01)
if bf.set("Hello"):
print("set new Hello")
if bf.test("Hello"):
print("find Hello")
Python with c extension is still slow, but it remains much faster than pybloom.
// i7-10710U & Python-3.11
pbf-set: 307.835638 ns/op
pbf-test: 289.679349 ns/op
pybloom-set: 2770.372372 ns/op
pybloom-test: 2417.377588 ns/op
// U7-155H & Python-3.12
pbf-set: 71.520046 ns/op
pbf-test: 62.544441 ns/op
rbloom-set: 83.451979 ns/op
rbloom-test: 44.645030 ns/op
Rust
let mut bf = pbf::new_bloom_filter(500, 0.01);
let hello = "Hello".as_bytes();
if (bf.set(hello)) {
println!("set new Hello");
}
if (bf.test(hello)) {
println!("find Hello");
}
// If false positive rate is a compile-time constant, use the fast path.
let mut fast = pbf::new_bloom_filter_fast!(500, 0.01);
if (fast.set(hello)) {
println!("set new Hello");
}
Rust remains competitive against fastbloom and rust-bloom-filter.
// i7-10710U & Rust-1.65
pbf-set: 45.99ns/op
pbf-test: 27.81ns/op
// U7-155H & Rust-1.80
pbf-set: 19.85ns/op
pbf-test: 12.50ns/op
fastbloom-set: 19.97ns/op
fastbloom-test: 14.93ns/op
rbf-set: 36.51ns/op
rbf-test: 24.93ns/op
Ranking
With test data on U7-155H machine, we got performance rank: C++, Go, Rust, Java, C#, Python.
Serialize & Deserialize
Data structures of different implements, except C++ with aesni, are consistent, so you can do cross-language serializing and deserializing without dedicated serialize & deserialize APIs. Just save and restore 3 scalar parameters way, page_level, unique_cnt, and the data bitmap. Values of way and page_level are always tiny integers, which can be represented by 4 bit.
// C++
auto bf = pbf::New(500, 0.01);
auto bf2 = pbf::New(bf->way(), bf->page_level(), bf->data(), bf->data_size(), bf->unique_cnt());
// A example (not standard) format
struct Pack {
uint32_t way : 4;
uint32_t page_level : 4;
uint32_t unique_cnt : 24;
uint32_t data_size;
uint8_t data[0];
};
// GO
bf := pbf.NewBloomFilter(500, 0.01)
bf2 := pbf.CreatePageBloomFilter(bf.Way(), bf.PageLevel(), bf.Data(), bf.Unique())
// Java
PageBloomFilter bf = PageBloomFilter.New(500, 0.01);
PageBloomFilter bf2 = PageBloomFilter.New(bf.getWay(), bf.getPageLevel(), bf.getData(), bf.getUniqueCnt());
// C#
var bf = PageBloomFilter.New(500, 0.01);
var bf2 = PageBloomFilter.New(bf.Way, bf.PageLevel, bf.Data, bf.UniqueCnt);
# Python
bf = pbf.create(500, 0.01)
bf2 = pbf.restore(bf.way, bf.page_level, bf.data, bf.unique_cnt)
// Rust
let mut bf = pbf::new_bloom_filter(500, 0.01);
let mut bf2 = pbf::restore_pbf(bf.get_way(), bf.get_page_level(), bf.get_data(), bf.get_unique_cnt());
Detail Benchmark
We got average latency per operation under 25ns in a benchmark with 500k elements on a Xeon-8374C machine. SIMD brings significant speed-up.
It runs slower on EPYC-7K83 machine.
Running test with SIMD on Xeon-8475B machine, we found aesni-hash helps a lot (amazing fast test operation under 7ns).
Running test with SIMD on EPYC-9T24 machine, we found aesni-hash helps a little.
Theoretical Analysis
Bytes per element - False positive rate

Occupancy rate - False positive rate

Related Skills
node-connect
340.5kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
84.2kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
340.5kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
commit-push-pr
84.2kCommit, push, and open a PR
