ExcaliburHash

About

Excalibur Hash is a high-speed hash map and hash set, ideal for performance-critical uses like video games. Its design focuses on being friendly to the CPU cache, making it very efficient and fast. It uses an open addressing hash table and manages removed items with a method called tombstones.

Engineered for ease of use, Excalibur Hash, in the vast majority of cases (99%), serves as a seamless, drop-in alternative to std::unordered_map. However, it's important to note that Excalibur Hash does not guarantee stable addressing. So, if your project needs to hold direct pointers to the keys or values, Excalibur Hash might not work as you expect. It also offers improved iterator design, which is not 100% compatible with STL iterators and might require some minor code changes. That aside, its design and efficiency make it a great choice for applications where speed is crucial.

Features

Extremely fast (see Performance section for details)
CPU cache friendly
Built-in configurable inline storage
Can either work as a map (key, value) or as a set (keys only)
Header-only library
Standard container-like interface
Support for custom key types via KeyInfo<T> specialization

Requirements

C++17 compatible compiler
Supported compilers: MSVC, GCC, Clang
Platforms: Windows, Linux, macOS

Design and Optimizations

ExcaliburHash achieves its exceptional performance through several carefully engineered design decisions and optimizations:

Core Architecture

Open Addressing with Linear Probing

Uses a single contiguous array instead of separate chaining, providing excellent cache locality
Linear probing minimizes memory indirection and maximizes CPU cache efficiency
No pointer chasing or dynamic memory allocation for individual elements

Tombstone Management

Deleted elements are marked as "tombstones" rather than shifting elements
Maintains stable iteration order and prevents expensive rehashing on deletions
Smart load factor calculation includes tombstones to prevent performance degradation
rehash() method available to clean up tombstones when needed

Memory Layout Optimizations

Configurable Inline Storage

Small hash tables store elements directly within the object (no heap allocation)
Default inline storage for 1 element eliminates malloc/free overhead for small maps
Configurable via template parameter: HashTable<K,V,64> for 64 inline elements
Seamlessly transitions to heap allocation when capacity is exceeded

Cache-Friendly Design

Power-of-2 bucket sizing enables fast modulo operations using bit masking
Contiguous memory layout maximizes cache line utilization
Elements stored in-place with proper alignment for optimal memory access

Performance Algorithms

Advanced Load Factor Management

Maintains 75% load factor (including tombstones) for optimal performance
Uses bit shifts for fast threshold calculation: (buckets >> 1) + (buckets >> 2) + 1
Prevents pathological cases where tombstones degrade lookup performance
Automatic growth by 2x when threshold is exceeded

High-Quality Hash Function

Uses WyHash algorithm, one of the fastest non-cryptographic hash functions
Platform-optimized implementation using compiler intrinsics
Multiplier constants derived from research (https://arxiv.org/abs/2001.05304)
Excellent distribution properties minimize clustering

Code Generation Optimizations

Template Metaprogramming

Single implementation for both const/non-const variants using std::conditional_t
Eliminates code duplication while maintaining type safety
Zero-cost abstractions for different iterator types
Compile-time feature detection for optimal code paths

Branch Prediction Friendly

Optimizes common cases (successful lookups, non-full tables)
Helps CPU branch predictor make better decisions

Memory Access Patterns

__restrict keywords inform compiler about non-aliasing pointers
Enables aggressive compiler optimizations and vectorization
Aligned memory allocation for SIMD-friendly access patterns

Customization Points

KeyInfo Specialization

Type traits system for custom key types
Defines empty/tombstone values and hash/equality functions
Compile-time customization without runtime overhead
Built-in optimized implementations for common types

Iterator Design: A Different Approach

ExcaliburHash takes a fundamentally different approach to iteration compared to standard library containers, prioritizing clarity, performance, and true genericity between maps and sets.

Default Iterator Behavior: Keys First

⚠️ Key Difference: Unlike STL containers, the default range-based for loop iterates over keys only:

Excalibur::HashMap<int, std::string> map;

// Default iteration - keys only (different from STL!)
for (const auto& key : map) {
    // 'key' is an int, not a pair
    std::cout << "Key: " << key << std::endl;
}

// Compare with STL:
std::unordered_map<int, std::string> stdMap;
for (const auto& pair : stdMap) {
    // 'pair' is std::pair<const int, std::string>
    std::cout << "Key: " << pair.first << std::endl;
}

Three Specialized Iterator Types

ExcaliburHash provides three distinct iterator types for different use cases:

Excalibur::HashMap<int, std::string> map;

// 1. Key-only iteration (IteratorK)
for (const auto& key : map.keys()) {
    std::cout << "Key: " << key << std::endl;
}

// 2. Value-only iteration (IteratorV) - hash maps only  
for (auto& value : map.values()) {
    value += "_modified";
}

// 3. Key-value iteration (IteratorKV) - hash maps only
for (const auto& item : map.items()) {
    std::cout << "Key: " << item.key() << ", Value: " << item.value() << std::endl;
}

Python-Inspired Design

This approach mirrors Python's dictionary iteration, which many developers find intuitive:

# Python dictionaries
for key in my_dict.keys():        # Keys only
for value in my_dict.values():    # Values only  
for key, value in my_dict.items():  # Key-value pairs

Why This Design?

1. Intent-Driven API

map.keys() clearly expresses "I want to iterate over keys"
map.values() clearly expresses "I want to iterate over values"
map.items() clearly expresses "I want both keys and values"

2. Semantic Clarity

item.key() and item.value() are self-documenting
No confusion about .first vs .second like STL pairs
Prevents common bugs from mixing up pair members

3. True Map/Set Genericity

The most important benefit: the same code works for both maps and sets:

template<typename Container>
void process_keys(const Container& container) {
    // Same code works for both maps and sets!
    for (const auto& key : container) {  // Always iterates keys
        std::cout << key << std::endl;
    }
}

// Usage:
Excalibur::HashMap<int, std::string> map;  // Key-value container
Excalibur::HashSet<int> set;               // Key-only container

process_keys(map);  // ✅ Works
process_keys(set);  // ✅ Works - same exact code!

STL Cannot Achieve This

template<typename Container>
void process_keys_stl(const Container& container) {
    for (const auto& item : container) {
        // ❌ Breaks! item.first doesn't exist for std::set
        std::cout << item.first << std::endl;  
    }
}

Iterator Availability Matrix

The type system automatically provides appropriate iterators based on container type:

// Hash Map: HashTable<Key, Value>
map.keys()    // ✅ Available
map.values()  // ✅ Available  
map.items()   // ✅ Available

// Hash Set: HashTable<Key, std::nullptr_t>
set.keys()    // ✅ Available
set.values()  // ❌ Compile error - no values in sets
set.items()   // ❌ Compile error - no key-value pairs in sets

This design enables writing truly generic algorithms that work across both container types without runtime checks, template specialization, or different APIs. The type system enforces correctness at compile time while maintaining optimal performance.

The combination of these optimizations results in a hash table that often outperforms std::unordered_map by 2-10x in real-world scenarios while maintaining a familiar API.

Performance

In this section, you can see a performance comparison against a few popular hash table implementations. This comparison will show their speed and efficiency, helping you understand which hash table might work best for your project.

Unless otherwise stated, all tests were run using the following configuration

OS: Windows 11 Pro (22621.4317)
CPU: Intel i9-12900K
RAM: 128Gb 
Compiled in the Release mode using VS2022 (19.39.33520)

Performance test repository

absl::flat_hash_map
boost::unordered_map
boost::unordered_flat_map
ska::flat_hash_map
ska::unordered_map
folly::F14ValueMap
llvm::DenseMap
[

ExcaliburHash

Install / Use

README

ExcaliburHash

About

Features

Requirements

Design and Optimizations

Core Architecture

Memory Layout Optimizations

Performance Algorithms

Code Generation Optimizations

Customization Points

Iterator Design: A Different Approach

Performance