DKit
C++ Library of Atomic and Lockless Data Structures
Install / Use
/learn @drbobbeaty/DKitREADME
DKit
<p align="center"> <img src="doc/img/dexter.jpeg" width="477" height="250" border="0" /> </p>Introduction
In all the work we've done over the years, nothing has been quite as liberating as thread-safe containers. They are an amazing experience for a developer because they allow the multi-threaded programmer to not have to worry about locks, lock contention, deadlocks, dirty reads, and a whole host of other issues that for years have made writing multi-threaded code difficult. These containers and objects free the user from having to worry about locking because they use the compare-and-swap (CAS) operations in GCC to ensure that the data accessed is done properly, and securely.
Library Dependencies
The focus of this library is not to be a general threading library. Nor is it to attempt to write code that's already written. As such, we're going to be using boost for several components. This will include things only used in the test applications - for example, threading code to test the multi-threaded queues. Since boost is available on virtually all platforms, and even included in the next C++ standard, this should not be a significant burden to the user.
There are even instructions to build it on Mac OS X, if you don't wish to use the Homebrew version that's freely available.
Base Container Template Classes
In order to make the conversion of the code as simple as possible, we have made a base class for each type of container in this library. These base classes are most easily used as instance variables, or method arguments, and then the specific type of container can be instantiated and used as if it were just the base class.
At the current time, we have the following base classes:
dkit::FIFOa simple first-in, first-out queue
Async I/O Package
As we get away from the low-level classes in DKit, we start to see how these components can be put together to form more significant components. One such group is the I/O package based on the boost's ASIO package. The problem with the boost package is that it still takes a non-trivial amount of work to put together some of the most fundamental things you'd need in a socket I/O library. Certainly, this is for flexibility, but when you're trying to handle exchange feeds, it makes it a lot easier if you don't start from that point, but maybe a little higher up the food chain.
This is the purpose of the DKit I/O package - to bring a set of completed, high-performance socket components that makes buidling these types of systems much easier.
Atomic Integers
The first element of DKit is the atomic integers. The sized integers in
stdint.h are very useful for mixed-mode code, and yet there's nothing that
allow simple atomic access to these values. So the library has the following
classes defined in atomic.h :
aboola simple atomic booleanauint8_ta simple atomic unsigned 8-bit integeraint8_ta simple atomic signed 8-bit integerauint16_ta simple atomic unsigned 16-bit integeraint16_ta simple atomic signed 16-bit integerauint32_ta simple atomic unsigned 32-bit integeraint32_ta simple atomic signed 32-bit integerauint64_ta simple atomic unsigned 64-bit integeraint64_ta simple atomic signed 64-bit integer
These all are simply an a on the front of the types defined in stdint.h,
and all have atomic access and updating. They all also have casting operators
that make it very easy to get the values out, when it's necessary.
Single-Producer, Single-Consumer Containers
Separated into a different namespace, the single-producer, single-consumer, containers have taken advantage of the fact that they are only filled by one and only one thread, and that they are emptied by one and only one thread. This means that their safe use in the code is the responsibility of the developer as they must be sure of the threads that are using these containers. However, when those conditions are met, these queues and other containers are exceptionally efficient, and perform their task very fast.
dkit::spsc::CircularFIFO
The simplest SPSC (single-producer, single-consumer) queue is a simple circular FIFO (first-in, first-out) queue. The implementation of this is really not all that different from any simple circular FIFO queue, with a few important exceptions. These exceptions, and the careful use by the user of this class make this a very nice candidate for simple pools.
One such excample, might be a std::string pool where a single thread is
responsible for getting data from some source, placing it into std::string
instances, and then when they are processed by another thread, these
std::string instances are recycled back to the pool. There's one "filling"
thread that reads from the queue, and another that "recycles" the spent
instances back to the pool.
Multiple-Producer, Single-Consumer Containers
Separated into a different namespace, the multiple-producer, single-consumer, containers have taken advantage of the fact that they are filled by many threads, but they are emptied by one and only one thread. This means that their safe use in the code is the responsibility of the developer as they must be sure of the threads that are using these containers. However, when those conditions are met, these queues and other containers are very efficient, and perform their task admirably.
dkit::mpsc::LinkedFIFO
The simplest MPSC (multiple-producer, single-consumer) queue is a simple linked FIFO (first-in, first-out) queue. In contrast to the CircularFIFO in this namespace, the LinkedFIFO allows for an unlimited size, constrained only by the available memory of the device. The trade-off is, of course, that the elements placed in this queue have their storage allocated on the heap. In contrast to the CircularFIFO, whose size is predetermined, and whose locations are contiguous, the LinkedFIFO can have it's elements scattered around the memory space, and therefore not as efficient to access as the CircularFIFO.
Yet, there will be times when this is not a significant downside, and the fact that the implementation is simple, and efficient makes up for the slight inefficiency in the accessing of elements in the queue.
dkit::mpsc::CircularFIFO
The fastest MPSC queue is a simple circular FIFO queue. The implementation
of this is very similar to that of the SPSC circular FIFO queue, with the
significant differences coming into play in the data structures as well as
the implementation of the push() and pop() methods.
The big difference in this implementation is that in order to allow for
multiple producers, we had to ensure that the push() method did ONE and
only one atomic operation to find a new, open, spot for the incoming data.
This means that the "increment and test" scheme in the SPSC CircularFIFO will
NOT work, and we had to come up with something a little more interesting.
SImilarly, the pop() method needed to take into account that we have but a
single thread hitting this method, and yet we needed to make sure that we
integrated nicely with the data structures that were necessary for the
multiple producers.
Single-Producer, Multiple-Consumer Containers
Separated into a different namespace, the single-producer, multiple-consumer, containers have taken advantage of the fact that they are filled by one thread, but they are emptied by many threads. This means that their safe use in the code is the responsibility of the developer as they must be sure of the threads that are using these containers. However, when those conditions are met, these queues and other containers are very efficient, and perform their task admirably.
dkit::spmc::LinkedFIFO
The simplest SPMC (single-producer, multiple-consumer) queue is a simple linked FIFO (first-in, first-out) queue. Like the LinkedIFO on the MPSC namespace, this LinkedFIFO allows for an unlimited size, constrained only by the available memory of the device. The trade-off is, of course, that the elements placed in this queue have their storage allocated on the heap.
Yet, there will be times when this is not a significant downside, and the fact that the implementation is simple, and efficient makes up for the slight inefficiency in the accessing of elements in the queue.
dkit::spmc::CircularFIFO
In order to implement a SPMC circular FIFO queue, we had to abandon the use
of _head and _tail in the calculation of the size(). This was because
it was possible to have a lockless queue if we allow that the _head and
_tail are at times only potential indexes into the queue and not a
definite location at the time. With this, we can back up certain
operations so that it's possible to keep things lockless, and the only real
cost is that the size() method can't depend on these values.
To remedy this problem, we added a simple _size instance variable and used
the CAS increment and decrement operators when we are certain that the
push() and pop() methods have succeeded. The downside of this is that
all the operations are going to take longer. How much longer? On my
development machine, here's a run of the MPSC CircularFIFO:
=== Testing speed and correctness of CircularFIFO ===
Passed - pushed on 500 integers
Passed - popped all 500 integers
Passed - unable to pop from an empty queue
Passed - did 50000000 push/pop pairs in 1127.99ms = 22.5598ns/op
and the overhead of the _size maintenance is seen in the SPMC queue:
=== Testing speed and correctness of Circul
