Perfecthash
A performant, parallel, probabilistic, random acyclic-graph, low-latency, perfect hash generation library.
Install / Use
/learn @tpn/PerfecthashREADME
Perfect Hash
Helper Utility for Generating Command Line Syntax
The ui/ directory contains the companion web UI (submodule) for generating
command line syntax. See ui/README.md for details.
Project Status (2026)
- Cross-platform CMake builds and CI are active for Linux, macOS, and Windows.
- Tag-driven release automation is available via GitHub Actions (
.github/workflows/release.yml). - CMake build profiles are supported:
fullonline-rawdog-jitonline-rawdog-and-llvm-jitonline-llvm-jit
- CMake package export/config support is available for downstream
find_package(PerfectHash CONFIG REQUIRED)consumers.
Overview
This project is a library for creating perfect hash tables from 32-bit key sets. It is based on the acyclic random 2-part hypergraph algorithm. Its primary goal is finding the fastest possible runtime solution.
It is geared toward offline table generation: a command line application is used
to generate a small C library that implements an Index() routine, which, given
an input key, will return the order-preserved index of that key within the
original key set, e.g.:
uint32_t ix;
uint32_t key = 0x20190903;
ix = Index(key);
This allows the efficient implementation of key-value tables, e.g.:
extern uint32_t Table[];
uint32_t
Lookup(uint32_t Key)
{
return Table[Index(Key)]
};
void
Insert(uint32_t Key, uint32_t Value)
{
Table[Index(Key)] = Value;
}
void
Delete(uint32_t Key)
{
Table[Index(Key)] = 0;
}
The fastest Index() routine is the MultiplyShiftRX
routine, which clocks in at about 6 cycles in practice, and boils down to
something like this:
extern uint32_t Assigned[];
uint32_t
Index(uint32_t Key)
{
uint32_t Vertex1;
uint32_t Vertex2;
Vertex1 = ((Key * Seed1) >> Shift);
Vertex2 = ((Key * Seed2) >> Shift);
return ((Assigned[Vertex1] + Assigned[Vertex2]) & IndexMask);
}
N.B. Seed1, Seed2, Shift, and IndexMask will all be literal
constants in the final source code, not variables.
This compiles down to:
lea r8, ptr [rip+0x23fc0]
imul edx, ecx, 0xe8d9cdf9
shr rdx, 0x10
movzx eax, word ptr [r8+rdx*2]
imul edx, ecx, 0xc2e3c0b7
shr rdx, 0x10
movzx ecx, word ptr [r8+rdx*2]
add eax, ecx
ret
The IACA profile reports 8 uops:
Intel(R) Architecture Code Analyzer Version - v3.0-28-g1ba2cbb build date: 2017-10-23;17:30:24
Analyzed File - .\x64\Release\HologramWorld_31016_Chm01_MultiplyShiftRX_And.dll
Binary Format - 64Bit
Architecture - SKL
Analysis Type - Throughput
Throughput Analysis Report
--------------------------
Block Throughput: 10.00 Cycles Throughput Bottleneck: Backend
Loop Count: 26
Port Binding In Cycles Per Iteration:
----------------------------------------------------------------------------
| Port | 0 - DV | 1 | 2 - D | 3 - D | 4 | 5 | 6 | 7 |
----------------------------------------------------------------------------
| Cycles | 1.3 0.0 | 2.0 | 1.0 1.0 | 1.0 1.0 | 0.0 | 1.3 | 1.3 | 0.0 |
----------------------------------------------------------------------------
| # Of | Ports pressure in cycles |
| Uops |0 - DV | 1 | 2 - D | 3 - D | 4 | 5 | 6 | 7 |
---------------------------------------------------------------
| 1 | | | | | | 1.0 | | | lea r8, ptr [rip+0x23fc0]
| 1 | | 1.0 | | | | | | | imul edx, ecx, 0xe8d9cdf9
| 1 | 0.3 | | | | | | 0.7 | | shr rdx, 0x10
| 1 | | | 1.0 1.0 | | | | | | movzx eax, word ptr [r8+rdx*2]
| 1 | | 1.0 | | | | | | | imul edx, ecx, 0xc2e3c0b7
| 1 | 0.7 | | | | | | 0.3 | | shr rdx, 0x10
| 1 | | | | 1.0 1.0 | | | | | movzx ecx, word ptr [r8+rdx*2]
| 1 | 0.3 | | | | | 0.3 | 0.3 | | add eax, ecx
Total Num of Uops: 8
The "cost" behind the perfect hash table is the Assigned array. The size of
this array will be the number of keys, rounded up to a power of two, and then
doubled. E.g. HologramWorld-31016.keys has 31,016 keys. Rounded up to a
power of two is 32,768, then doubled: 65,336.
The data type used by the Assigned array is the smallest C data type that
can hold the number of keys rounded up to a power of two. Thus, a 16-bit
unsigned short int can be used for the HologramWorld-31016.keys array:
unsigned short int Assigned[65336] = { ... };
Thus, sizeof(Assigned) will be 131,072, or 128KB.
The Index() routine will perform two memory lookups into this array per call.
No pointer chasing or indirection is required. The most frequent keys will have
both locations in L1 cache; the worst-case scenario is two memory lookups for
both locations for cold or infrequent keys.
Quick Guide
Current development is CMake-first and cross-platform. CI continuously builds Linux, macOS, and Windows configurations, and release automation is tag-driven. The historical Visual Studio solution remains available for Windows workflows.
The generated compiled perfect hash tables are cross-platform, and will work on Windows, Mac, Linux, x86, x64, and ARM64.
Building
Windows
mkdir c:\src
cd src
git clone https://github.com/tpn/perfecthash
git clone https://github.com/tpn/perfecthash-keys
cd perfecthash/src
The PerfectHash.sln file lives in perfecthash/src. You can either build
this directly via Visual Studio, use one of the build-*.bat files, or just
use msbuild from a Visual Studio 2022 command prompt:
msbuild /nologo /m /t:Rebuild /p:Configuration=Release;Platform=x64
You can also download the latest binaries from the Releases
page. The PGO zip files refer to profile-guided optimization builds, and are
generally faster than the Release builds by up to 30-40%.
Once built or downloaded, there are two main command line executables:
PerfectHashCreate.exe, and PerfectHashBulkCreate.exe. The former is for
creating a single table, and it takes a single input key file. The latter
can be pointed at a directory of keys, and it will create tables for all of
them.
Linux
Prerequisites: C compiler (GCC 10 tested), CMake. Optional: Ninja.
Recommended (mamba/conda) environment:
# x86_64 (pre-generated)
mamba env create -f conda/environments/dev-linux_os-linux_arch-x86_64_py-313_cuda-none_compiler-llvm.yaml
mamba activate dev-linux_os-linux_arch-x86_64_py-313_cuda-none_compiler-llvm
# ARM64 / aarch64 (pre-generated)
mamba env create -f conda/environments/dev-linux-arm64_os-linux_arch-aarch64_py-313_cuda-none_compiler-llvm.yaml
mamba activate dev-linux-arm64_os-linux_arch-aarch64_py-313_cuda-none_compiler-llvm
Generated environment files live under conda/environments/ and are produced
from dependencies.yaml using rapids-dependency-file-generator.
Python package metadata is maintained separately in the repo-root
pyproject.toml; dependencies.yaml is no longer used to generate the legacy
python/ package metadata.
If you need a different Python version, pick the matching py-314 environment
file from conda/environments/. You can also create a minimal dev/test
environment manually:
mamba create -y -n perfecthash-dev -c conda-forge \
python=3.12 rust cmake ninja make pkg-config clang clangxx lld llvmdev pytest
mamba activate perfecthash-dev
mkdir -p ~/src && cd ~/src
git clone https://github.com/tpn/perfecthash
git clone https://github.com/tpn/perfecthash-keys
cd perfecthash
cmake -S . -B build -G"Ninja Multi-Config"
cmake --build build --config Release
cmake --build build --config Debug
Note: the default build enables -march=native for required SIMD intrinsics.
Use -DPERFECTHASH_ENABLE_NATIVE_ARCH=OFF if you need a portable binary.
Build profile examples:
cmake -S . -B build-full -G"Ninja Multi-Config" -DPERFECTHASH_BUILD_PROFILE=full
cmake -S . -B build-online-rawdog -G"Ninja Multi-Config" -DPERFECTHASH_BUILD_PROFILE=online-rawdog-jit
cmake -S . -B build-online-rawdog-llvm -G"Ninja Multi-Config" -DPERFECTHASH_BUILD_PROFILE=online-rawdog-and-llvm-jit
cmake -S . -B build-online-llvm -G"Ninja Multi-Config" -DPERFECTHASH_BUILD_PROFILE=online-llvm-jit
CUDA build (Ninja Multi-Config):
cmake -S . -B build-cuda -G"Ninja Multi-Config" \
-DUSE_CUDA=ON -DCMAKE_CUDA_ARCHITECTURES=89
cmake --build build-cuda --config Release
CUDA builds require CUDAToolkit on PATH. Set
CMAKE_CUDA_ARCHITECTURES to your GPU (e.g., 86, 89, 90).
For normal Makefile support:
cd perfecthash
cmake -S . -B build -G"Unix Makefiles"
cmake --build build
Tests (Linux/macOS)
Unit tests and CLI integration tests are available via CTest. They require a
build with tests enabled and use the keys/HologramWorld-31016.keys fixture.
The codegen tests require cargo (from the Rust toolchain) to be available.
cmake -S . -B build-tests -G Ninja -DPERFECTHASH_ENABLE_TESTS=ON -DBUILD_TESTING=ON
cmake --build build-tests
ctest --test-dir build-tests --output-on-failure
Mac
Prerequisites: X
