SkillAgentSearch skills...

Perfecthash

A performant, parallel, probabilistic, random acyclic-graph, low-latency, perfect hash generation library.

Install / Use

/learn @tpn/Perfecthash
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

Perfect Hash

macos linux windows

Helper Utility for Generating Command Line Syntax The ui/ directory contains the companion web UI (submodule) for generating command line syntax. See ui/README.md for details.

Project Status (2026)

  • Cross-platform CMake builds and CI are active for Linux, macOS, and Windows.
  • Tag-driven release automation is available via GitHub Actions (.github/workflows/release.yml).
  • CMake build profiles are supported:
    • full
    • online-rawdog-jit
    • online-rawdog-and-llvm-jit
    • online-llvm-jit
  • CMake package export/config support is available for downstream find_package(PerfectHash CONFIG REQUIRED) consumers.

Overview

This project is a library for creating perfect hash tables from 32-bit key sets. It is based on the acyclic random 2-part hypergraph algorithm. Its primary goal is finding the fastest possible runtime solution.

It is geared toward offline table generation: a command line application is used to generate a small C library that implements an Index() routine, which, given an input key, will return the order-preserved index of that key within the original key set, e.g.:

    uint32_t ix;
    uint32_t key = 0x20190903;

    ix = Index(key);

This allows the efficient implementation of key-value tables, e.g.:

    extern uint32_t Table[];

    uint32_t
    Lookup(uint32_t Key)
    {
        return Table[Index(Key)]
    };

    void
    Insert(uint32_t Key, uint32_t Value)
    {
        Table[Index(Key)] = Value;
    }

    void
    Delete(uint32_t Key)
    {
        Table[Index(Key)] = 0;
    }

The fastest Index() routine is the MultiplyShiftRX routine, which clocks in at about 6 cycles in practice, and boils down to something like this:

extern uint32_t Assigned[];

uint32_t
Index(uint32_t Key)
{
    uint32_t Vertex1;
    uint32_t Vertex2;

    Vertex1 = ((Key * Seed1) >> Shift);
    Vertex2 = ((Key * Seed2) >> Shift);

    return ((Assigned[Vertex1] + Assigned[Vertex2]) & IndexMask);
}

N.B. Seed1, Seed2, Shift, and IndexMask will all be literal constants in the final source code, not variables.

This compiles down to:

lea     r8, ptr [rip+0x23fc0]
imul    edx, ecx, 0xe8d9cdf9
shr     rdx, 0x10
movzx   eax, word ptr [r8+rdx*2]
imul    edx, ecx, 0xc2e3c0b7
shr     rdx, 0x10
movzx   ecx, word ptr [r8+rdx*2]
add     eax, ecx
ret

The IACA profile reports 8 uops:

Intel(R) Architecture Code Analyzer Version -  v3.0-28-g1ba2cbb build date: 2017-10-23;17:30:24
Analyzed File -  .\x64\Release\HologramWorld_31016_Chm01_MultiplyShiftRX_And.dll
Binary Format - 64Bit
Architecture  -  SKL
Analysis Type - Throughput

Throughput Analysis Report
--------------------------
Block Throughput: 10.00 Cycles       Throughput Bottleneck: Backend
Loop Count:  26
Port Binding In Cycles Per Iteration:
----------------------------------------------------------------------------
| Port   |  0  - DV  |  1  |  2  - D   |  3  - D   |  4  |  5  |  6  |  7  |
----------------------------------------------------------------------------
| Cycles | 1.3   0.0 | 2.0 | 1.0   1.0 | 1.0   1.0 | 0.0 | 1.3 | 1.3 | 0.0 |
----------------------------------------------------------------------------

| # Of |         Ports pressure in cycles                     |
| Uops |0 - DV | 1   | 2 - D   | 3 - D    | 4 | 5   | 6   | 7 |
---------------------------------------------------------------
|  1   |       |     |         |          |   | 1.0 |     |   | lea r8, ptr [rip+0x23fc0]
|  1   |       | 1.0 |         |          |   |     |     |   | imul edx, ecx, 0xe8d9cdf9
|  1   | 0.3   |     |         |          |   |     | 0.7 |   | shr rdx, 0x10
|  1   |       |     | 1.0 1.0 |          |   |     |     |   | movzx eax, word ptr [r8+rdx*2]
|  1   |       | 1.0 |         |          |   |     |     |   | imul edx, ecx, 0xc2e3c0b7
|  1   | 0.7   |     |         |          |   |     | 0.3 |   | shr rdx, 0x10
|  1   |       |     |         | 1.0  1.0 |   |     |     |   | movzx ecx, word ptr [r8+rdx*2]
|  1   | 0.3   |     |         |          |   | 0.3 | 0.3 |   | add eax, ecx
Total Num of Uops: 8

The "cost" behind the perfect hash table is the Assigned array. The size of this array will be the number of keys, rounded up to a power of two, and then doubled. E.g. HologramWorld-31016.keys has 31,016 keys. Rounded up to a power of two is 32,768, then doubled: 65,336.

The data type used by the Assigned array is the smallest C data type that can hold the number of keys rounded up to a power of two. Thus, a 16-bit unsigned short int can be used for the HologramWorld-31016.keys array:

unsigned short int Assigned[65336] = { ... };

Thus, sizeof(Assigned) will be 131,072, or 128KB.

The Index() routine will perform two memory lookups into this array per call. No pointer chasing or indirection is required. The most frequent keys will have both locations in L1 cache; the worst-case scenario is two memory lookups for both locations for cold or infrequent keys.

Quick Guide

Current development is CMake-first and cross-platform. CI continuously builds Linux, macOS, and Windows configurations, and release automation is tag-driven. The historical Visual Studio solution remains available for Windows workflows.

The generated compiled perfect hash tables are cross-platform, and will work on Windows, Mac, Linux, x86, x64, and ARM64.

Building

Windows

mkdir c:\src
cd src
git clone https://github.com/tpn/perfecthash
git clone https://github.com/tpn/perfecthash-keys

cd perfecthash/src

The PerfectHash.sln file lives in perfecthash/src. You can either build this directly via Visual Studio, use one of the build-*.bat files, or just use msbuild from a Visual Studio 2022 command prompt:

msbuild /nologo /m /t:Rebuild /p:Configuration=Release;Platform=x64

You can also download the latest binaries from the Releases page. The PGO zip files refer to profile-guided optimization builds, and are generally faster than the Release builds by up to 30-40%.

Once built or downloaded, there are two main command line executables: PerfectHashCreate.exe, and PerfectHashBulkCreate.exe. The former is for creating a single table, and it takes a single input key file. The latter can be pointed at a directory of keys, and it will create tables for all of them.

Linux

Prerequisites: C compiler (GCC 10 tested), CMake. Optional: Ninja.

Recommended (mamba/conda) environment:

# x86_64 (pre-generated)
mamba env create -f conda/environments/dev-linux_os-linux_arch-x86_64_py-313_cuda-none_compiler-llvm.yaml
mamba activate dev-linux_os-linux_arch-x86_64_py-313_cuda-none_compiler-llvm

# ARM64 / aarch64 (pre-generated)
mamba env create -f conda/environments/dev-linux-arm64_os-linux_arch-aarch64_py-313_cuda-none_compiler-llvm.yaml
mamba activate dev-linux-arm64_os-linux_arch-aarch64_py-313_cuda-none_compiler-llvm

Generated environment files live under conda/environments/ and are produced from dependencies.yaml using rapids-dependency-file-generator. Python package metadata is maintained separately in the repo-root pyproject.toml; dependencies.yaml is no longer used to generate the legacy python/ package metadata.

If you need a different Python version, pick the matching py-314 environment file from conda/environments/. You can also create a minimal dev/test environment manually:

mamba create -y -n perfecthash-dev -c conda-forge \
  python=3.12 rust cmake ninja make pkg-config clang clangxx lld llvmdev pytest
mamba activate perfecthash-dev
mkdir -p ~/src && cd ~/src
git clone https://github.com/tpn/perfecthash
git clone https://github.com/tpn/perfecthash-keys

cd perfecthash
cmake -S . -B build -G"Ninja Multi-Config"
cmake --build build --config Release
cmake --build build --config Debug

Note: the default build enables -march=native for required SIMD intrinsics. Use -DPERFECTHASH_ENABLE_NATIVE_ARCH=OFF if you need a portable binary.

Build profile examples:

cmake -S . -B build-full -G"Ninja Multi-Config" -DPERFECTHASH_BUILD_PROFILE=full
cmake -S . -B build-online-rawdog -G"Ninja Multi-Config" -DPERFECTHASH_BUILD_PROFILE=online-rawdog-jit
cmake -S . -B build-online-rawdog-llvm -G"Ninja Multi-Config" -DPERFECTHASH_BUILD_PROFILE=online-rawdog-and-llvm-jit
cmake -S . -B build-online-llvm -G"Ninja Multi-Config" -DPERFECTHASH_BUILD_PROFILE=online-llvm-jit

CUDA build (Ninja Multi-Config):

cmake -S . -B build-cuda -G"Ninja Multi-Config" \
    -DUSE_CUDA=ON -DCMAKE_CUDA_ARCHITECTURES=89
cmake --build build-cuda --config Release

CUDA builds require CUDAToolkit on PATH. Set CMAKE_CUDA_ARCHITECTURES to your GPU (e.g., 86, 89, 90). For normal Makefile support:

cd perfecthash
cmake -S . -B build -G"Unix Makefiles"
cmake --build build

Tests (Linux/macOS)

Unit tests and CLI integration tests are available via CTest. They require a build with tests enabled and use the keys/HologramWorld-31016.keys fixture. The codegen tests require cargo (from the Rust toolchain) to be available.

cmake -S . -B build-tests -G Ninja -DPERFECTHASH_ENABLE_TESTS=ON -DBUILD_TESTING=ON
cmake --build build-tests
ctest --test-dir build-tests --output-on-failure

Mac

Prerequisites: X

View on GitHub
GitHub Stars91
CategoryDevelopment
Updated1d ago
Forks15

Languages

C

Security Score

100/100

Audited on Apr 4, 2026

No findings