SkillAgentSearch skills...

Hamt

A hash array-mapped trie implementation in C

Install / Use

/learn @mkirchner/Hamt
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

libhamt

A hash array-mapped trie (HAMT) implementation in C99. A HAMT is a data structure that can be used to efficiently implement [persistent][wiki_persistent_data_structure] associative arrays (aka maps, dicts) and sets, see the Introduction. The implementation here loosely follows Bagwell's 2000 paper[[1]][bagwell_00_ideal], with a focus on code clarity.

What prompted the somewhat detailed writeup was the realization that there is not a lot of in-depth documentation for HAMTs beyond the original Bagwell paper[[1][bagwell_00_ideal]]. Some of the more helpful posts are [Karl Krukow's intro to Clojure's PersistentHashMap][krukov_09_understanding], [C. S. Lim's C++ template implementation][chaelim_hamt], [Adrian Coyler's morning paper post][coyler_15_champ] and the original [Steindoerfer/Vinju compressed HAMT article it summarizes][steindoerfer_15_optimizing]. The rest mostly seems to be all bits and pieces and this document is an attempt to (partially) improve that situation.

FIXME: Complete docs (removal, persistence, path copying)

Quickstart

To build the library and run the tests:

$ git clone git@github.com:mkirchner/hamt.git
$ cd hamt
$ make
$ make test
$ make runtest

In order to use libhamt in your own projects, copy the required sources to your own source tree.

Benchmarks

Benchmark results

The current HAMT implementation consistently outperforms the libavl AVL-tree and red-black tree implementations by 5x for querying, 1.5x-4x for node insert, and 1.5x-5x for node removal. Persistent insert and remove implementations scale roughly similar to the classic trees, with more favorable scaling behavior for HAMT. Where table caching is an option, the persistent HAMT implementation reaches better insert performance than red-black trees and better removal performance than red-black and AVL trees at appoximately 10e5 elements.

Compared to hash tables, HAMT query times start at 2x vs. GLib's HashTable and 20x vs. hsearch(3) (the latter still being investigated) and then get progressively worse. This makes sense, given the O(1) vs. O(log N) expectations between the different approaches.

Note that benchmarking and optimization is an ongoing effort and please take all numbers with a pinch of salt. All measurements have so far been collected on a single system (Apple MBP M2 Max under Ventura 13.4.1).

For detailed performance comparison with AVL and red-black trees (from libavl) and the HashTree from GLib, see [the benchmarking repo][hamt_bench_github].

Introduction

A hash array mapped trie (HAMT) is a data structure that can be used to implement [associative arrays][wiki_associative_array] (aka maps) and [sets][wiki_set_adt].

Structurally, HAMTs are [hash trees][wiki_hash_tree] that combine favorable characteristics of [hash tables][wiki_hash_table] and array mapped [tries][wiki_trie], namely almost hash table-like time complexity guarantees[[1]][bagwell_00_ideal] (O(log<sub>32</sub>n)) and economic use of memory.

An additional benefit, and a key motivation for the work presented here, is that augmentation of HAMTs with path copying and garbage collection allows for a straightforward and efficient implementation of [persistent][wiki_persistent] versions of maps and sets.

The remaining documentation starts with a description of the libhamt API and two examples that demonstrate the use of a HAMT as an ephemeral and persistent data structure, respectively. It then details the implementation: starting from the foundational data structures and the helper code required for hash exhaustion and table management, we cover search, insertion, removal, and iterators. The final implementation section introduces path copying and explains the changes required to support persistent insert and remove operations. It closes with an outlook and an appendix.

API

HAMT lifecycle

The core data type exported in the libhamt interface is struct hamt. In order to create a struct hamt instance, one must call hamt_create(), which requires a hash function of type hamt_key_hash_fn to hash keys, a comparison function of type hamt_cmp_fn to compare keys, and a pointer to a hamt_allocator instance. hamt_delete() deletes struct hamt instances that were created with hamt_create().

/* The libhamt core data structure is a handle to a hash array-mapped trie */

/* Function signature definitions for key comparison and hashing */
typedef int (*hamt_cmp_fn)(const void *lhs, const void *rhs);
typedef uint32_t (*hamt_key_hash_fn)(const void *key, const size_t gen);

/* API functions for lifecycle management */
struct hamt *hamt_create(hamt_key_hash_fn key_hash, hamt_cmp_fn key_cmp, struct hamt_allocator *ator);
void hamt_delete(struct hamt *);

The hamt_key_hash_fn takes a key and a generation gen. The expectation is that the supplied hash function returns different hashes for the same key but different generations. Depending on the choice of hash function this can be implemented using gen as a seed or modifying a copy of key on the fly. See the examples section for a murmur3-based implementation and the hashing section for more information on suitable hash functions.

Memory management

libhamt exports its internal memory management API through the hamt_allocator struct. The struct specifies the functions that the HAMT implementation uses to allocate, re-allocate and deallocate system memory. The API provides a default hamt_allocator_default which refers to the standard malloc(), realloc() and free() functions.

struct hamt_allocator {
    void *(*malloc)(const size_t size);
    void *(*realloc)(void *chunk, const size_t size);
    void (*free)(void *chunk);
};

extern struct hamt_allocator hamt_allocator_default;

Exporting the libhamt memory management API enables library clients to make use of alternate memory management solutions, most notably of garbage collection solutions (e.g. the [Boehm-Demers-Weiser GC][boehm_gc]) which are required when using the HAMT as a persistent data structure (see the structural sharing example).

Query

size_t hamt_size(const struct hamt *trie);
const void *hamt_get(const struct hamt *trie, void *key);

The hamt_size() function returns the size of the HAMT in O(1). Querying the HAMT (i.e. searching a key) is done with hamt_get() which takes a pointer to a key and returns a result in O(log<sub>32</sub> n) - or NULL if the key does not exist in the HAMT.

Iterators

The API also provides key/value pair access through the hamt_iterator struct.

size_t hamt_size(const struct hamt *trie);
const void *hamt_get(const struct hamt *trie, void *key);

Iterators are tied to a specific HAMT and are created using the hamt_it_create() function, passing the HAMT instance the iterator should refer to. Iterators can be advanced with the hamt_it_next() function and as long as hamt_it_valid() returns true, the hamt_it_get_key() and hamt_it_get_value() functions will return the pointers to the current key/value pair. In order to delete an existing and/or exhausted iterator, call hamt_it_delete().

typedef struct hamt_iterator_impl *hamt_iterator;

hamt_iterator hamt_it_create(const struct hamt *trie);
void hamt_it_delete(hamt_iterator it);
bool hamt_it_valid(hamt_iterator it);
hamt_iterator hamt_it_next(hamt_iterator it);
const void *hamt_it_get_key(hamt_iterator it);
const void *hamt_it_get_value(hamt_iterator it);

Iterators maintain state about their traversal path and changes to the HAMT that an iterator refers to implicitly invalidate the iteration (i.e. undefined behavior).

The order in which iterators return the key value pairs is fully defined by the structure of the trie, which, in turn, is completely defined by the choice of hash function and (where applicable) seed.

Insert & Remove

libhamt supports ephemeral and [persistent][wiki_persistent] (aka not ephemeral) HAMTs through two different interfaces: hamt_set() and hamt_remove() for ephemeral use, and their p-versions hamt_pset() and hamt_premove() for persistent use.

Ephemeral modification

const void *hamt_set(struct hamt *trie, void *key, void *value);
void *hamt_remove(struct hamt *trie, void *key);

hamt_set() takes a pair of key and value pointers and adds the pair to the HAMT, returning a pointer to the value. If the key already exists, hamt_set() updates the pointer to the value.

hamt_remove() takes a key and removes the key/value pair with the respective key from the HAMT, returning a pointer to the value that was just removed. If the key does not exist, hamt_remove() returns NULL.

Persistent HAMTs

The semantics of persistent HAMTs are different from their ephemeral counterparts: since every modification creates a new version of a HAMT, the modificiation functions return a new HAMT. Modification of a persistent HAMT therefore requires a reassignment idiom if the goal is modification only:

const struct hamt *h = hamt_create(...)
...
/* Set a value and drop the reference to the old HAMT; the GC
 * will take care of cleaning up remaining unreachable allocations.
 */
h = hamt_pset(h, some_key, some_value);
...

This seems wasteful at first glance but the respective functions implement structural sharing such that the overhead is limited to ~log<sub>32</sub>(N) nodes (where N is the number of nodes in the graph).

const struct hamt *hamt_pset(const struct hamt *trie, void *key, void *value);
const struct hamt *hamt_premove(const struct hamt *trie, void *key);

hamt_pset() inserts or updates the key with value and returns an opaque handle to the new HAMT. The new HAMT is guaranteed to contain the new key/value pair.

hamt_premove() attempts to remove the value with the key `k

View on GitHub
GitHub Stars330
CategoryEducation
Updated18d ago
Forks8

Languages

C

Security Score

100/100

Audited on Mar 10, 2026

No findings