Async
async is a tiny C++ header-only high-performance library for async calls handled by a thread-pool, which is built on top of an unbounded MPMC lock-free queue.
Install / Use
/learn @d36u9/AsyncREADME
async
[License(Boost Software License - Version 1.0)]
Welcome
async is a tiny C++ header-only high-performance library for async calls handled by a thread-pool, which is built on top of an unbounded MPMC lock-free queue. It's written in pure C++14 (C++11 support with preprocessor macros), no dependencies on other 3rd party libraries.
Note: This library is originally designed for 64bit system. It has been tested on arch X86-64 and ARMV8(64bit), and ARMV7(32bit).
change logs
- Jun. 2018:
- Added support for ARMV7 & V8
- Tested on Raspberry Pi 3 B+ with Gentoo ARMV8 64bit (Linux Pi64 4.14.44-V8 AArch64)
- Tested on Raspberry Pi 3 B+ with Raspbian ARMV7 32bit (Linux 4.14.34-v7 armv7l)
- Added Benchmark Results for Raspberry Pi 3 B+ ARMV8 (Linux Pi64 4.14.44-V8 AArch64)
- Added Benchmark Results for Raspberry Pi 3 B+ ARMV7 32bit (Linux 4.14.34-v7 armv7l)
- Sept. 2017:
- Significantly improved the performance of async::queue without bulk operations.
- async::threadpool also benifits from this change.
- A bounded MPMC queue
async::bounded_queuewas added to the lib, which is pretty useful for memory constrainted system or some fixed-size message pipeline design. The overall performance of this buffer basedasync::bounded_queueis comparable to bulk operations of node-basedasync::queue.async::bounded_queueshares the almost identical interface asasync::queue, except for bulk operations, and a size prarameter has to be passed tobounded_queue's constructor, and also added blocking methods (blocking_enqueue&blocking_dequeue).TRAIT::NOEXCEPT_CHECKsetting is also similar toasync::queueto help handle exceptions that may be thrown in element's ctor.bounded_queueis basically a C++ implementation of PTLQueue design (Please read Dave Dice's article for details and references).
Features
- interchangeable with std::async, accepts all kinds of callable instances, like static functions, member functions, functors, lambdas
- dynamically changeable thread-pool size at run-time
- tasks are managed in a lock-free queue
- provided lock-free queue doesn't have restricted limitation as boost::lockfree::queue
- low-latency for the task execution thanks to underlying lock-free queue
Tested Platforms& Compilers
(old versions of OSs or compilers may work, but not tested)
- Windows 10 Visual Studio 2015+
- Linux Ubuntu 16.04 gcc4.9.2+/clang 3.8+
- MacOS Sierra 10.12.5 clang-802.0.42
Getting Started
Building the test& benchmark
C++11 compilers
If your compiler only supports C++11, please edit CMakeLists.txt with the following change:
set(CMAKE_CXX_STANDARD 14)
#change to
set(CMAKE_CXX_STANDARD 11)
Build& test with Microsoft C++ REST SDK
If your OS is Windows or has cppresetsdk installed& configured on Linux or Mac, please edit CMakeLists.txt to enable PPL test:
option(WITH_CPPRESTSDK "Build Cpprestsdk Test" OFF)
#to
option(WITH_CPPRESTSDK "Build Cpprestsdk Test" ON)
Build for Linux or Mac (x86-64 & ARMV7&V8)
#to use clang (linux) with following export command
#EXPORT CC=clang-3.8
#EXPORT CXX=clang++-3.8
#run the following to set up release build, (for MasOS Xcode, you can remove -DCMAKE_BUILD_TYPE for now, and choose build type at build-time)
cmake -H. -Bbuild -DCMAKE_BUILD_TYPE=RELEASE
#now build the release
cmake --build build --config Release
#or debug
cmake --build build --config Debug
#or other builds
cmake --build build --config RelWithDebInfo
cmake --build build --config MinSizeRel
Build for Windows (X86-64)
#for VS 2015
cmake -H. -Bbuild -G "Visual Studio 14 2015 Win64"
#or VS 2017
cmake -H. -Bbuild -G "Visual Studio 15 2017 Win64"
#build the release from command line or you can open the project file in Visual Studio, and build from there
cmake --build build --config Release
How to use it in your project/application
simply copy all headers in async sub-folder to your project, and include those headers in your source code.
Thread Pool Indrodction
Thread Pool intializations
async::threadpool tp; //by default, thread pool size will be the same number of your hardware CPU core/threads
async::threadpool tp(8); //create a thread pool with 8 threads
async::threadpool tp(0); //create a thread pool with no threads available, it's in pause mode
resize the thread pool
async::threadpool tp(32);
...//some operations
tp.configurepool(16);// can be called at anytime (as long as tp is still valid) to reset the pool size
// no interurption for running tasks
submit the task
*static functions, member functions, functors, lambdas are all supported
int foo(int i) { return ++i; }
auto pkg = tp.post(foo, i); //retuns a std::future
pkg.get(); //will block
multi-producer multi-consumer unbounded lock-free queue Indrodction
The design: A simple and classic implementation. It's link-based 3-level depth nested container with local array for each level storage and simulated tagged pointer for linking. The size of each level, and tag bits can be configured through TRAITS (please see source for details). The queue with default traits seetings can store up to 1 Trillion elements/nodes (at least 1 Terabyte memory space).
element type requirements
- nothrow destructible
- optional (better to be true)
- nothrow constructible
- nothrow move-assignable
NOTE: the exception thrown by constructor is acceptable. Although it'd be better to keep ctor noexcept if possible.
noexcept detection is turned off by default, it can be turned on by setting TRAIT::NOEXCEPT_CHECK to true.
With TRAIT::NOEXCEPT_CHECK on(true), queue will enable exception handling if ctor or move assignment may throw exceptions.
queue intializations
async::queue<T> q; //default constructor, it's unbounded
async::queue<T> q(1000); // pre-allocated 1000 storage nodes, the capcity will increase automatically after 1000 nodes are used
usage
// enqueues a T constructed from args, supports the following constructions:
// move, if args is a T rvalue
// copy, if args is a T lvalue, or
// emplacement if args is an initializer list that can be passed to a T constructor
async::queue<T>::enqueue(Args... args)
async::queue<T>::dequeue(T& data) //type T should have move assignment operator,
//e.g.
async::queue<int> q;
q.enqueue(11);
int i(0);
q.dequeue(i);
bulk operations
It's convienent for bulk data, and also can boost the throughput.
exception handling is not available in bulk operations even with TRAIT::NOEXCEPT_CHECK being true.
bulk operations are suitable for plain data types, like network/event messages.
int a[] = {1,2,3,4,5};
int b[5];
q.bulk_enqueue(std::bengin(a), 5);
auto popcount = q.bulk_dequeue(std::begin(b), 5); //popcount is the number of elemtnets sucessfully pulled from the queue.
//or like the following code:
std::vector<int> v;
auto it = std::inserter(v, std::begin(v));
popcount = q.bulk_dequeue(it, 5);
Unit Test
The unit test code provides most samples for usage.
Benchmark
NOTE: the results may vary on different OS platforms and hardware.
thread pool benchmark
The benchmark is a simple demonstration. NOTE: may require extra config, please see CMakeLists.txt for detailed settings The test benchamarks the following task/job based async implementation:
- async::threadpool (this library)
- std::async
- boost::async
- AsioThreadPool (my another implementation based on boost::asio, has very stable and good performance, especially on Windows with iocp)
- Microsoft::PPL (pplx from cpprestsdk on Linux& MacOS or PPL on windows)
e.g. Windows 10 64bit Intel i7-6700K 16GB RAM 480GB SSD Visual Studio 2017 (cl 19.11.25507.1 x64)
Benchmark Test Run: 1 Producers 7(* not applied) Consumers with 21000 tasks and run 100 batches
async::threapool (time/task) avg: 1130 ns max: 1227 ns min: 1066 ns avg_task_post: 1032 ns
*std::async (time/task) avg: 1469 ns max: 1549 ns min: 1423 ns avg_task_post: 1250 ns
*Microsoft::PPL (time/task) avg: 1148 ns max: 1216 ns min: 1114 ns avg_task_post: 1088 ns
AsioThreadPool (time/task) avg: 1166 ns max: 1319 ns min: 1013 ns avg_task_post: 1073 ns
*boost::async (time/task) avg: 29153 ns max: 30028 ns min: 27990 ns avg_task_post: 23343 ns
...
Benchmark Test Run: 4 Producers 4(* not applied) Consumers with 21000 tasks and run 100 batches
async::threapool (time/task) avg: 439 ns max: 557 ns min: 398 ns avg_task_post: 356 ns
*std::async (time/task) avg: 800 ns max: 890 ns min: 759 ns avg_task_post: 629 ns
*Microsoft::PPL (time/task) avg: 666 ns max: 701 ns min: 640 ns avg_task_post: 605 ns
AsioThreadPool (time/task) avg: 448 ns max: 541 ns min: 389 ns avg_task_post: 365 ns
*boost::async (time/task) avg: 32419 ns max: 33296 ns min: 30105 ns avg_task_post: 25561 ns
...
Benchmark Test Run: 7 Producers 1(* not applied) Consumers with 21000 tasks and run 100 batches
async::threapool (time/task) avg: 262 ns max: 300 ns min: 252 ns avg_task_post: 176 ns
*std::async (time/task) avg: 873 ns max: 961 ns min: 821 ns avg_task_post: 701 ns
*Microsoft::PPL (time/task) avg: 727 ns max: 755 ns min: 637 ns avg_task_post: 662 ns
AsioThreadPool (time/task) avg: 607 ns max: 645 ns min: 567 ns avg_task_post: 210 ns
*boost::async (time/task) avg: 33158 ns max: 150331 ns min: 28560 ns avg_task_post: 28655 ns
e.g. Ubuntu 17.04 Intel i7-6700K 16GB RAM 100GB HDD gcc 6.3.0
Benchmark Test Run: 1 Producers 7(* not applied) Consumers with 21000 tasks and run 100 batches
async::threapool (time/task) avg: 1320 ns max: 1357 ns min: 1301 ns avg_task_post: 1266 ns
*std::async (time/task) avg: 11817 ns max: 12469 ns min: 11533 ns avg_task_post: 9580 ns
*Microsoft::PPL (ti
