PoolSTL
Light and self-contained implementation of C++17 parallel algorithms.
Install / Use
/learn @alugowski/PoolSTLREADME
poolSTL
Light, self-contained, thread pool-based implementation of C++17 parallel standard library algorithms.
C++17 introduced parallel overloads of standard library algorithms that accept an Execution Policy as the first argument. Policies specify limits on how the implementation may parallelize the algorithm, enabling methods like threads, vectorization, or even GPU. Policies can be supplied by the compiler or by libraries like this one.
std::sort(std::execution::par, vec.begin(), vec.end());
// ^^^^^^^^^^^^^^^^^^^ native C++17 parallel Execution Policy
Unfortunately compiler support varies. Quick summary of compilers' default standard libraries:
| | Linux | macOS | Windows | |:------------------|:------------:|:------------:|:------------:| | GCC 9+ | TBB Required | TBB Required | TBB Required | | GCC 8- | ❌ | ❌ | ❌ | | Clang (libc++) | ❌ | ❌ | ❌ | | Clang (libstdc++) | TBB Required | TBB Required | TBB Required | | Apple Clang | | ❌ | | | MSVC 15.7+ (2017) | | | ✅ | | Parallel STL | TBB Required | TBB Required | TBB Required | | poolSTL | ✅* | ✅* | ✅* |
PoolSTL is a supplement to fill in the support gaps. It is not a full implementation; only the basics are covered. However, it is small, easy to integrate, and has no external dependencies. A good backup to the other options.
Use poolSTL exclusively, or only on platforms lacking native support, or only if TBB is not present.
Supports C++11 and higher. Algorithms introduced in C++17 require C++17 or higher.
Tested in CI on GCC 7+, Clang/LLVM 5+, Apple Clang, MSVC, MinGW, and Emscripten.
Implemented Algorithms
Algorithms are added on an as-needed basis. If you need one open an issue or contribute a PR.
Limitations: All iterators must be random access. No nested parallel calls.
<algorithm>
all_of,any_of,none_ofcopy,copy_ncount,count_iffill,fill_nfind,find_if,find_if_notfor_each,for_each_npartitionsort,stable_sorttransform
<numeric>
exclusive_scan(C++17 only)reducetransform_reduce(C++17 only)
All in std:: namespace.
Other
poolstl::iota_iter- Iterate over integers. Same as iterating over output ofstd::iotabut without materializing anything. Iterator version ofstd::ranges::iota_view.poolstl::for_each_chunk- Likestd::for_each, but explicitly splits the input range into chunks then exposes the chunked parallelism. A user-specified chunk constructor is called for each parallel chunk then its output is passed to each loop iteration. Useful for workloads that need an expensive workspace that can be reused between iterations, but not simultaneously by all iterations in parallel.poolstl::pluggable_sort- Likestd::sort, but allows specification of sequential sort method. To parallelize pdqsort:pluggable_sort(par, v.begin(), v.end(), pdqsort).
Usage
PoolSTL provides:
poolstl::par: Substitute forstd::execution::par. Parallelized using a thread pool.poolstl::seq: Substitute forstd::execution::seq. Simply calls the regular (non-policy) overload.poolstl::par_if(): Choose parallel or sequential at runtime. See below.
In short, use poolstl::par to make your code parallel. Complete example:
#include <iostream>
#include <poolstl/poolstl.hpp>
int main() {
std::vector<int> v = {0, 1, 2, 3, 4, 5};
auto sum = std::reduce(poolstl::par, vec.cbegin(), vec.cend());
// ^^^^^^^^^^^^
// Add this to make your code parallel.
std::cout << "Sum=" << sum << std::endl;
return 0;
}
Controlling Thread Pool Size with par.on(pool)
The thread pool used by poolstl::par is managed internally by poolSTL. It is started on first use.
Use your own thread pool
with poolstl::par.on(pool) for control over thread count, startup/shutdown, etc.:
task_thread_pool::task_thread_pool pool{4}; // 4 threads
std::reduce(poolstl::par.on(pool), vec.begin(), vec.end());
Choosing Parallel or Sequential at Runtime with par_if
Sometimes the choice whether to parallelize or not should be made at runtime. For example, small datasets may not amortize the cost of starting threads, while large datasets do and should be parallelized.
Use poolstl::par_if to select between par and seq at runtime:
bool is_parallel = vec.size() > 10000;
std::reduce(poolstl::par_if(is_parallel), vec.begin(), vec.end());
Use poolstl::par_if(is_parallel, pool) to control the thread pool used by par, if selected.
Examples
Parallel for (auto& value : vec)
std::vector<int> vec = {0, 1, 2, 3, 4, 5};
// Parallel for-each
std::for_each(poolstl::par, vec.begin(), vec.end(), [](auto& value) {
std::cout << value; // loop body
});
Parallel for (int i = 0; i < 100; ++i)
using poolstl::iota_iter;
// parallel for loop
std::for_each(poolstl::par, iota_iter<int>(0), iota_iter<int>(100), [](auto i) {
std::cout << i; // loop body
});
Parallel Sort
std::vector<int> vec = {5, 2, 1, 3, 0, 4};
std::sort(poolstl::par, vec.begin(), vec.end());
Installation
Single File
Each release publishes a single-file amalgamated poolstl.hpp. Simply copy this into your project.
Build requirements:
- Clang and GCC 8 or older: require
-lpthreadto use C++11 threads. - Emscripten: compile and link with
-pthreadto use C++11 threads. See docs.
CMake
include(FetchContent)
FetchContent_Declare(
poolSTL
GIT_REPOSITORY https://github.com/alugowski/poolSTL
GIT_TAG main
GIT_SHALLOW TRUE
)
FetchContent_MakeAvailable(poolSTL)
target_link_libraries(YOUR_TARGET poolSTL::poolSTL)
Alternatively copy or checkout the repo into your project and:
add_subdirectory(poolSTL)
Benchmark
See benchmark/ to compare poolSTL against the standard sequential implementation, and (if available) the
native std::execution::par implementation.
Results on an M1 Pro (6 power, 2 efficiency cores), with GCC 13:
-------------------------------------------------------------------------------------------------------
Benchmark Time CPU Iterations
-------------------------------------------------------------------------------------------------------
all_of()/real_time 19.9 ms 19.9 ms 35
all_of(poolstl::par)/real_time 3.47 ms 0.119 ms 198
all_of(std::execution::par)/real_time 3.45 ms 3.25 ms 213
find_if()/needle_percentile:5/real_time 0.988 ms 0.987 ms 712
find_if()/needle_percentile:50/real_time 9.87 ms 9.86 ms 71
find_if()/needle_percentile:100/real_time 19.7 ms 19.7 ms 36
find_if(poolstl::par)/needle_percentile:5/real_time 0.405 ms 0.050 ms 1730
find_if(poolstl::par)/needle_percentile:50/real_time 1.85 ms 0.096 ms 393
find_if(poolstl::par)/needle_percentile:100/real_time 3.64 ms 0.102 ms 193
find_if(std::execution::par)/needle_percentile:5/real_time 0
