SharedHashFile: Share Hash Tables With Stable Key Hints Stored In Memory Mapped Files Between Arbitrary Processes

SharedHashFile is a lightweight, embeddable NoSQL key value store / hash table with stable key hints, a zero-copy IPC queue, & a multiplexed IPC logging library written in C for Linux. Data accessed directly in shared memory; no sockets are used between SharedHashFile and the application program; no server process. APIs for C & C++.

Nailed It

Project Goals

Faster speed.
Simpler & smaller footprint.
Lower memory usage.
Concurrent, in memory access.

Technology

Data Storage

Data is kept in shared memory by default, making all the data accessible to any processes. Up to 4 billion keys can be stored in a single SharedHashFile hash table which is limited in size only by available RAM.

For example, let's say you have a box with 128GB RAM of which 96GB is used by SharedHashFile hash tables. The box has 24 cores and there are 24 processes (e.g. nginx forked slave processes or whatever) concurrently accessing the 96GB of hash tables. Each process shares exactly the same 96GB of shared memory.

Direct Memory Access

Because a key,value pair is held in shared memory across n processes, that shared memory can change and/or move at any time. Therefore, getting a value always results in receiving a thread local copy of the value.

Data Encoding

Keys and values are currently binary strings, with a combined maximum length of 2^32 bytes.

Allocation & Garbage Collection

Conventional malloc() does not function in shared memory, since we have to use offsets instead of conventional pointers. Hence SharedHashFile uses its own implementation of malloc() for shared memory.

To avoid memory holes then garbage collection happens from time to time upon key,value insertion. The number of key,value pairs effected during garbage collection is intentionally limited by the algorithm to a maximum of 8,192 pairs no matter how many keys have been inserted in the hash table. This means the hash table always feels very responsive.

Hash Table Expansion

SharedHashFile is designed to expand gracefully as more key,value pairs are inserted. There are no sudden memory increases or memory doubling events. And there are no big pauses due to rehashing keys en masse.

Locking

To reduce contention there is no single global hash table lock by design. Instead keys are sharded across 256 locks to reduce lock contention.

Optional Fixed Length Keys and Values

For use cases with high levels of writing then performance can suffer due to too many system mmap() calls due to recycling / shrinking memory mapped areas when removing memory holes due to deleted keys.

To improve performance for write heavy use cases, keys and values can be fixed in size across the entire hash table, which means deleted keys can be easily re-used without creating memory holes, and no expensive system mmap() calls are necessary.

Using fixed length keys and values also reduces the amount of RAM used because the key and value sizes are no longer stored, e.g. 100 million keys and values would save 100 million * 8 bytes = 800 million bytes.

Persistent Storage

Hash tables are stored in memory mapped files in /dev/shm which means the data persists even when no processes are using hash tables. However, the hash tables will not survive rebooting.

Unique Identifers AKA Stable Key Hints

Unlike other hash tables, every key stored in SharedHashFile gets assigned its own UID, e.g. shf_make_hash("key", 3); uint32_t uid = shf_put_key_val(shf, "val", 3). To get the same key in the future, choose between accessing the key via its key, or via its UID, e.g. shf_make_hash("key", 3); shf_get_key_val_copy(shf) or shf_get_uid_val_copy(shf, uid).

What are UIDs useful for? UIDs don't take up any extra resources and can be thought of as resource 'free'. Accessing a key by its UID is faster than accessing the key via its key. Because a UID is only 32bits in size then it can be easily stored as a reference to a key in your program, or embedded in the values of of key,value pairs, or even embedded within other keys.

Example usage: If uid1 points to key "user-id-<xyz>", and uid2 points to key "facebook.com", then another 'mash up' key might be "<uid1><uid2>". Want to find out if "user-id-<xyz>" has "facebook.com" in their personal URL whitelist? Just see if key "<uid1><uid2>" exists.

What does the 'stable' in 'stable key hint' mean? It means that the UID stays the same even if the key and/or value bytes move around in memory.

Zero-Copy IPC Queues

How does it work? Create X fixed-sized queue elements, and Y queues to push & pull those queue elements to/from.

Example: Imagine two processes Process A & Process B. Process A creates 100,000 queue elements and 3 queues; queue-free, queue-a2b, and queue-b2a. Intitally, all queue elements are pushed onto queue-free. Process A then spawns Process B which attaches to the SharedHashFile in order to pull from queue-a2b. To perform zero-copy IPC then Process A can pull queue elements from queue-free, manipulate the fixed size, shared memory queue elements, and push the queue elements into queue-a2b. Process B does the opposite; pulls queue elements from queue-a2b, manipulates the fixed size, shared memory queue queue elements, and pushes the queue elements into queue-b2a. Process A can also pull queue items from queue-b2a in order to digest the results from Process B.

So how many queue elements per second can be moved back and forth by Processes A & Process B? On a Lenovo W530 laptop then about 90 million per second if both Process A & Process B are written in C.

Note: When a queue element is moved from one queue to another then it is not copied, only a reference is updated.

Multiplexed IPC Logging

How does it work? Process A calls shf_log_thread_new() which creates a shared memory log buffer and a log output thread which periodically monitors for new log lines. Process B calls shf_log_attach_existing() to start logging to the same shared log. Log using C macros SHF_PLAIN() and SHF_DEBUG(). If shf_log_thread_new() has not been called then output goes automatically to stdout, else the logging is multiplexed by the log output thread.

Example output:

sharedhashfile$ cat debug/test.q.shf.t.tout
1..10
=0.000000 23056 pid 23056 started; mode is 'c2c'
=0.000013 23056 - SHF_SNPRINTF() // 'test-23056-ipc-queue'
ok 1 -    c2*: shf_attach()          works for non-existing file as expected
=0.000002 23060 shf.monitor: monitoring pid 23056 to delete /dev/shm/test-23056-ipc-queue.shf
1..7
=0.000001 23064 pid 23064 started; mode is '4c'
=0.000010 23064 - SHF_SNPRINTF() // 'test-23064-ipc-queue'
ok 1 -     4c: shf_attach_existing() works for existing file as expected
#0.003948 1 --> auto mapped to thread id 23061
#0.003948 1 shf_log_thread(shf=?){}
ok 2 -    c2*: put lock in value as expected
ok 3 -    c2*: shf_q_new() returned as expected
ok 4 -    c2*: moved   expected number of new queue items // estimate 51,044,225 q items per second without contention
#0.131327 2 --> auto mapped to thread id 23064
#0.131327 2 '4c' mode; behaving as client
ok 2 -     4c: shf_q_get_name('qid-free') returned qid as expected
ok 3 -     4c: shf_q_get_name('qid-a2b' ) returned qid as expected
ok 4 -     4c: shf_q_get_name('qid-b2a' ) returned qid as expected
#0.158467 2 shf_race_start() // 2 horses started after 0.000001 seconds
#0.158474 2 testing process b IPC queue a2b --> b2a speed
#0.158467 3 --> auto mapped to thread id 23056
#0.158467 3 shf_race_start() // 2 horses started after 0.027820 seconds
#0.158484 3 testing process a IPC queue b2a --> a2b speed
ok 5 -     4c: moved   expected number of new queue items // estimate 53,106,512 q items per second with contention
#0.179207 2 testing process b IPC lock speed
ok 6 -     4c: got lock value address as expected
ok 5 -    c2*: moved   expected number of new queue items // estimate 52,951,698 q items per second with contention
#0.180467 3 testing process a IPC lock speed
ok 6 -    c2*: got lock value address as expected
#0.180475 3 shf_race_start() // 2 horses started after 0.000000 seconds
#0.180475 2 shf_race_start() // 2 horses started after 0.001165 seconds
ok 7 -    c2*: rw lock expected number of times           // estimate 4,422,109 locks per second; with contention
ok 7 -     4c: rw lock expected number of times           // estimate 4,411,319 locks per second; with contention
#0.633875 2 ending child
ok 8 -    c2*: rw lock expected number of times           // estimate 51,144,435 locks per second; without contention
ok 9 -    c2*: rw lock expected number of times           // esti

Sharedhashfile

Install / Use

README