SkillAgentSearch skills...

Async

async is a tiny C++ header-only high-performance library for async calls handled by a thread-pool, which is built on top of an unbounded MPMC lock-free queue.

Install / Use

/learn @d36u9/Async

README

async

[License(Boost Software License - Version 1.0)]

Welcome

async is a tiny C++ header-only high-performance library for async calls handled by a thread-pool, which is built on top of an unbounded MPMC lock-free queue. It's written in pure C++14 (C++11 support with preprocessor macros), no dependencies on other 3rd party libraries.

Note: This library is originally designed for 64bit system. It has been tested on arch X86-64 and ARMV8(64bit), and ARMV7(32bit).

change logs

  • Jun. 2018:
    • Added support for ARMV7 & V8
    • Tested on Raspberry Pi 3 B+ with Gentoo ARMV8 64bit (Linux Pi64 4.14.44-V8 AArch64)
    • Tested on Raspberry Pi 3 B+ with Raspbian ARMV7 32bit (Linux 4.14.34-v7 armv7l)
    • Added Benchmark Results for Raspberry Pi 3 B+ ARMV8 (Linux Pi64 4.14.44-V8 AArch64)
    • Added Benchmark Results for Raspberry Pi 3 B+ ARMV7 32bit (Linux 4.14.34-v7 armv7l)
  • Sept. 2017:
    • Significantly improved the performance of async::queue without bulk operations.
    • async::threadpool also benifits from this change.
    • A bounded MPMC queue async::bounded_queue was added to the lib, which is pretty useful for memory constrainted system or some fixed-size message pipeline design. The overall performance of this buffer based async::bounded_queue is comparable to bulk operations of node-based async::queue. async::bounded_queue shares the almost identical interface as async::queue, except for bulk operations, and a size prarameter has to be passed to bounded_queue's constructor, and also added blocking methods (blocking_enqueue & blocking_dequeue). TRAIT::NOEXCEPT_CHECK setting is also similar to async::queue to help handle exceptions that may be thrown in element's ctor. bounded_queue is basically a C++ implementation of PTLQueue design (Please read Dave Dice's article for details and references).

Features

  • interchangeable with std::async, accepts all kinds of callable instances, like static functions, member functions, functors, lambdas
  • dynamically changeable thread-pool size at run-time
  • tasks are managed in a lock-free queue
  • provided lock-free queue doesn't have restricted limitation as boost::lockfree::queue
  • low-latency for the task execution thanks to underlying lock-free queue

Tested Platforms& Compilers

(old versions of OSs or compilers may work, but not tested)

  • Windows 10 Visual Studio 2015+
  • Linux Ubuntu 16.04 gcc4.9.2+/clang 3.8+
  • MacOS Sierra 10.12.5 clang-802.0.42

Getting Started

Building the test& benchmark

C++11 compilers

If your compiler only supports C++11, please edit CMakeLists.txt with the following change:

set(CMAKE_CXX_STANDARD 14)
#change to 
set(CMAKE_CXX_STANDARD 11)

Build& test with Microsoft C++ REST SDK

If your OS is Windows or has cppresetsdk installed& configured on Linux or Mac, please edit CMakeLists.txt to enable PPL test:

option(WITH_CPPRESTSDK "Build Cpprestsdk Test" OFF)
#to
option(WITH_CPPRESTSDK "Build Cpprestsdk Test" ON)

Build for Linux or Mac (x86-64 & ARMV7&V8)

#to use clang (linux) with following export command
#EXPORT CC=clang-3.8
#EXPORT CXX=clang++-3.8
#run the following to set up release build, (for MasOS Xcode, you can remove -DCMAKE_BUILD_TYPE for now, and choose build type at build-time)
cmake -H. -Bbuild -DCMAKE_BUILD_TYPE=RELEASE
#now build the release
cmake --build build --config Release
#or debug
cmake --build build --config Debug
#or other builds
cmake --build build --config RelWithDebInfo
cmake --build build --config MinSizeRel

Build for Windows (X86-64)

#for VS 2015
cmake -H. -Bbuild -G "Visual Studio 14 2015 Win64"
#or VS 2017
cmake -H. -Bbuild -G "Visual Studio 15 2017 Win64"
#build the release from command line or you can open the project file in Visual Studio, and build from there
cmake --build build --config Release

How to use it in your project/application

simply copy all headers in async sub-folder to your project, and include those headers in your source code.

Thread Pool Indrodction

Thread Pool intializations

async::threadpool tp; //by default, thread pool size will be the same number of your hardware CPU core/threads
async::threadpool tp(8); //create a thread pool with 8 threads
async::threadpool tp(0); //create a thread pool with no threads available, it's in pause mode

resize the thread pool

async::threadpool tp(32);
...//some operations
tp.configurepool(16);// can be called at anytime (as long as tp is still valid) to reset the pool size
                     // no interurption for running tasks

submit the task

*static functions, member functions, functors, lambdas are all supported

int foo(int i) { return ++i; }
auto pkg = tp.post(foo, i); //retuns a std::future
pkg.get(); //will block

multi-producer multi-consumer unbounded lock-free queue Indrodction

The design: A simple and classic implementation. It's link-based 3-level depth nested container with local array for each level storage and simulated tagged pointer for linking. The size of each level, and tag bits can be configured through TRAITS (please see source for details). The queue with default traits seetings can store up to 1 Trillion elements/nodes (at least 1 Terabyte memory space).

element type requirements

  • nothrow destructible
  • optional (better to be true)
    • nothrow constructible
    • nothrow move-assignable

NOTE: the exception thrown by constructor is acceptable. Although it'd be better to keep ctor noexcept if possible. noexcept detection is turned off by default, it can be turned on by setting TRAIT::NOEXCEPT_CHECK to true. With TRAIT::NOEXCEPT_CHECK on(true), queue will enable exception handling if ctor or move assignment may throw exceptions.

queue intializations

async::queue<T> q; //default constructor, it's unbounded

async::queue<T> q(1000); // pre-allocated 1000 storage nodes, the capcity will increase automatically after 1000 nodes are used

usage

// enqueues a T constructed from args, supports the following constructions:
// move, if args is a T rvalue
// copy, if args is a T lvalue, or
// emplacement if args is an initializer list that can be passed to a T constructor
async::queue<T>::enqueue(Args... args)

async::queue<T>::dequeue(T& data) //type T should have move assignment operator,
//e.g.
async::queue<int> q;
q.enqueue(11);
int i(0);
q.dequeue(i);

bulk operations

It's convienent for bulk data, and also can boost the throughput. exception handling is not available in bulk operations even with TRAIT::NOEXCEPT_CHECK being true. bulk operations are suitable for plain data types, like network/event messages.

int a[] = {1,2,3,4,5};
int b[5];
q.bulk_enqueue(std::bengin(a), 5);
auto popcount = q.bulk_dequeue(std::begin(b), 5); //popcount is the number of elemtnets sucessfully pulled from the queue.
//or like the following code:
std::vector<int> v;
auto it = std::inserter(v, std::begin(v));
popcount = q.bulk_dequeue(it, 5);

Unit Test

The unit test code provides most samples for usage.

Benchmark

NOTE: the results may vary on different OS platforms and hardware.

thread pool benchmark

The benchmark is a simple demonstration. NOTE: may require extra config, please see CMakeLists.txt for detailed settings The test benchamarks the following task/job based async implementation:

  • async::threadpool (this library)
  • std::async
  • boost::async
  • AsioThreadPool (my another implementation based on boost::asio, has very stable and good performance, especially on Windows with iocp)
  • Microsoft::PPL (pplx from cpprestsdk on Linux& MacOS or PPL on windows)

e.g. Windows 10 64bit Intel i7-6700K 16GB RAM 480GB SSD Visual Studio 2017 (cl 19.11.25507.1 x64)

Benchmark Test Run: 1 Producers 7(* not applied) Consumers  with 21000 tasks and run 100 batches
  async::threapool (time/task) avg: 1130 ns  max: 1227 ns  min: 1066 ns avg_task_post: 1032 ns
       *std::async (time/task) avg: 1469 ns  max: 1549 ns  min: 1423 ns avg_task_post: 1250 ns
   *Microsoft::PPL (time/task) avg: 1148 ns  max: 1216 ns  min: 1114 ns avg_task_post: 1088 ns
    AsioThreadPool (time/task) avg: 1166 ns  max: 1319 ns  min: 1013 ns avg_task_post: 1073 ns
     *boost::async (time/task) avg: 29153 ns  max: 30028 ns  min: 27990 ns avg_task_post: 23343 ns
...
Benchmark Test Run: 4 Producers 4(* not applied) Consumers  with 21000 tasks and run 100 batches
  async::threapool (time/task) avg: 439 ns  max: 557 ns  min: 398 ns avg_task_post: 356 ns
       *std::async (time/task) avg: 800 ns  max: 890 ns  min: 759 ns avg_task_post: 629 ns
   *Microsoft::PPL (time/task) avg: 666 ns  max: 701 ns  min: 640 ns avg_task_post: 605 ns
    AsioThreadPool (time/task) avg: 448 ns  max: 541 ns  min: 389 ns avg_task_post: 365 ns
     *boost::async (time/task) avg: 32419 ns  max: 33296 ns  min: 30105 ns avg_task_post: 25561 ns
...
Benchmark Test Run: 7 Producers 1(* not applied) Consumers  with 21000 tasks and run 100 batches
  async::threapool (time/task) avg: 262 ns  max: 300 ns  min: 252 ns avg_task_post: 176 ns
       *std::async (time/task) avg: 873 ns  max: 961 ns  min: 821 ns avg_task_post: 701 ns
   *Microsoft::PPL (time/task) avg: 727 ns  max: 755 ns  min: 637 ns avg_task_post: 662 ns
    AsioThreadPool (time/task) avg: 607 ns  max: 645 ns  min: 567 ns avg_task_post: 210 ns
     *boost::async (time/task) avg: 33158 ns  max: 150331 ns  min: 28560 ns avg_task_post: 28655 ns

e.g. Ubuntu 17.04 Intel i7-6700K 16GB RAM 100GB HDD gcc 6.3.0

Benchmark Test Run: 1 Producers 7(* not applied) Consumers  with 21000 tasks and run 100 batches
  async::threapool (time/task) avg: 1320 ns  max: 1357 ns  min: 1301 ns avg_task_post: 1266 ns
       *std::async (time/task) avg: 11817 ns  max: 12469 ns  min: 11533 ns avg_task_post: 9580 ns
   *Microsoft::PPL (ti
View on GitHub
GitHub Stars31
CategoryDevelopment
Updated13d ago
Forks9

Languages

C++

Security Score

80/100

Audited on Mar 15, 2026

No findings