dht — Distributed Hash Table for Erlang

The dht application implements a Distributed Hash Table for Erlang. It is excised from the etorrent application in order to make it possible to use it without using the rest of etorrent.

The code is highly rewritten by now. There are few traces left of the original code base. The reason for the partial rewrite was to support a full QuickCheck model, so code was changed in order to make certain parts easier to handle from a formal point of view.

State of the code

The code is currently early alpha state, and things may not work yet. In particular, many parts of the system has not been thoroughly tested and as such they may contain grave bugs.

The QuickCheck model currently exercises every part of the system, but avoids the search and refresh code for the time being.

Check the issues at github. They may contain current problems.

Flag days

Since we are at an early alpha, changes will be made which are not backwards compatible. This section describes the so-called “flag days” at which we change stuff in ways that are not backwards compatible to earlier versions. Once things are stable and we are looking at a real release, a versioning scheme will be in place to handle this.

2015-08-15: Increased the size of the token from 4 to 8 bytes. Protocol format has changed as a result. The old clients will fail to handle this correctly. Also, started using the Versions more correctly in the protocol.

Building

The code relies on no additional modules but the standard Erlang/OTP implementation in order to ease its portability to other systems and injection into existing code bases. The code does require at least Erlang/OTP release 18 as it makes heavy use of the new time API. To handle older Erlang releases, some work is needed in the module dht_time to provide backporting capabilities.

To build the code:

make

This uses erlang.mk currently, but you can also execute:

rebar compile

rebar3 compile

which will also work on the project.

What is a DHT?

A distributed hash table is a mapping from identities which are 256 bit numbers to a list of pairs, [{IP, Port}] on which the identity has been registered. The table is distributed by having a large number of nodes store small parts of the mapping each. Also, each node keeps a partial routing table in order to be able to find values.

Note the DHT, much like DNS doesn't interpret what the identity is. It just maps that identity onto the IP/Port endpoints. That is, you have to write your own protocol once you know where an identity is registered.

Also, the protocol makes no security guarantees. Anyone can claim to store a given ID. It is recommended clients have some way to verify that indeed, the other end stores the correct value for the ID. For instance, one can run ID = crypto:sha(256, Val) on a claimed value to verify that it is correct, as long as the value does not take too long to transfer. Another option is to store a Public key as the identity and then challenging the target end to verify the correct key, for instance by use of a protocol like CurveCP.

The implemented DHT has the following properties:

Multiple peers can store the same identity. In that case, each peer is reflected in the table.
Protocol agnostic, you are encouraged to do your own protocol, security checking and blacklisting of peers.
Insertion is fast, but deletion can take up to one hour because it is based on timeouts. If a node disappears, it can take up to one hour before its entries are removed as well.
Scalability is in millions of nodes.
Storage is probabilistic. An entry is stored at 8 nodes, and if all 8 nodes disappear, that entry will be lost for up to 45 minutes, before it is reinserted in a refresh.

Using the implementation:

The implementation supports 3 high-level commands, which together with two low-level commands is enough to run the DHT from scratch. Say you have just started a new DHT node:

application:ensure_all_started(dht)

Then, in order to make the DHT part of swarm, you must know at least one node in the swarm. How to obtain the nodes it outside of the scope of the DHT application currently. One you know about a couple of nodes, you can ping them to insert them into the DHT routing table:

[dht:ping({IP, Port}) || {IP, Port} <- Peers]

Then, in order to populate the routing table quickly, execute a find node search on your own randomly generated ID:

Self = dht:node_id(),
dht_search:run(find_node, Self).

At this point, we are part of the swarm, and can start using it. Say we want to say that ID 456 are present on our node, port 3000. We then execute:

dht:enter(456, 3000)

and now the DHT tracks that association. If we later want to rid ourselves of the association, we use

dht:delete(456)

Finally, other nodes can find the association by executing a lookup routine:

[{IP, 3000}] = dht:lookup(456)

where IP is the IP address the node is using to send UDP packets.

The internal state of the system

I believe a lot in the concepts of transparency and discoverability in which the system will be try to be transparent in what is happening inside it, so you can verify the internal state for correctness. You can obtain information about the internal state by the command:

dht:info().

which will tell you what the internal state of everything is currently. This is useful if you suspect something is wrong as it gives a top-level glance of the internal state as a snapshot.

Code layout and how the DHT works

In general, we follow the Kademlia hash table implementation. But the implementation details of this code is down below. The key take-away is that in order to implement Kademlia in Erlang, you have to come up with a working process model.

The system supports 4 low-level commands:

PING({IP, Port})—Ask a node if it is alive
FIND_NODE({IP, Port}, NodeID)—Ask a peer routing table about information on NodeID.
FIND_VALUE({IP, Port}, ID)—Ask a peer about knowledge about a stored value ID. May return either information about that value, or a set of nodes who are likely to know more about it. Will also return a temporary token to be used in subsequent STORE operations.
STORE({IP, Port}, Token, ID, Port)—If the current node is on IP-address NodeIP, then call a peer at {IP, Port} to associate ID => {NodeIP, Port}. The Token provides proof we recently executed a FIND_VALUE call to the peer.

These are supported by a number of process groups, which we describe in the following:

dht_metric.erl

The dht_metric module is a library, not a process. It implements the metric space in which the DHT operates:

The DHT operates on an identity-space which are 256 bit integers. If X and Y are IDs, then X xor Y forms a metric, or distance-function, over the space. Each node in the DHT swarm has an ID, and nodes are close to each other if the distance is small.

Every ID you wish to store is a 256 bit integer. Likewise, they are mapped into the space with the nodes. Nodes which are “close” to a stored ID tracks who is currently providing service for that ID.

dht_net.erl, dht_proto.erl

The DHT communicates via a simple binary protocol that has been built loosely on the BitTorrent protocol. However, the protocol has been changed completely from BitTorrent and has no resemblance to the original protocol whatsoever. The protocol encoding/decoding scheme is implemented in dht_proto and can be changed later on if necessary by adding a specific 8 byte random header to messages.

All of the network stack is handled by dht_net. This gen_server runs two tasks: incoming requests and outgoing requests.

For incoming requests, a handler process is spawned to handle that particular query. Once it knows about an answer, it invokes dht_net again to send out the response. For outgoing requests, the network gen_server uses noreply to block the caller until either timeout or until a response comes back for a request. It then unblocks the caller. In short, dht_net multiplexes the network.

The network protocol uses UDP. Currently, we don't believe packets will be large enough to mess with the MTU size of the IP stack, but we may be wrong.

Storing data at another node is predicated on a random token. Technically, the random token is hashed with the IP address of the peer in order to produce a unique token for that peer. If the peer wants to store data it has to present that token to prove it has recently executed a search and hit the node. It provides some security against forged stores of data at random peers. The random token is cycled once every minute to make sure there is some progress and a peer can't reuse a token forever.

The network protocol provides no additional security. In particular, there is currently no provision for blacklisting of talkative peers, no storage limit per node, no security handshake, etc.

dht_state.erl, dht_routing_meta.erl, dht_routing_table.erl

The code in dht_state runs the routing table state of the system. To do so, it uses the dht_routing_table code to maintain the routing table, and uses the dht_routing_meta code to handle meta-data about nodes in the table.

Imagine a binary tree based on the digital bit-pattern of a 256 bit number. A 0 means “follow the left child” whereas a 1 means “follow the right child” in the tree. Since each node has a 256 bit ID, the NodeID, it sits somewhere in this tree. The full tree is the full routing table for the system, but in general this is a very large tree containing all nodes currently present in the system. Since there can be millions of nodes, it is not feasible to store the full routing table in each node. Hence, they opt for partial storage.

If you follow the path from the node's NodeID to the root of the tree, yo

Dht

Install / Use

README