Pstld
Experimental implementation of ParallelSTL on top of GCD aka libdispatch
Install / Use
/learn @mikekazakov/PstldREADME
pstld
Experimental implementation of ParallelSTL on top of Grand Central Dispatch aka libdispatch
The purpose of this library is to provide a drop-in implementation of C++ parallel algorithms for the Apple platforms. Xcode comes with no parallel algorithms in libc++, so this library aims to fill the gap. pstld uses the native scheduler (libdispatch) and does not depend on any 3rd-party libraries.
Usage
The simplest way to use the library is to consume it as header-only and to request exposing the parallel algorithms in the namespace std:
#include <cstdlib>
#include <iostream>
#define PSTLD_HEADER_ONLY // no prebuilt library, only the header
#define PSTLD_HACK_INTO_STD // export into namespace std
#include "pstld.h"
int main()
{
std::vector<unsigned> v(100'000'000);
std::generate(v.begin(), v.end(), [] { return std::rand(); });
std::cout << std::reduce(std::execution::par, v.begin(), v.end()) << std::endl;
}
The default Xcode toolchain can now build it:
% clang++ -std=c++17 main.cpp -o test && ./test
64818392
The same functions can be used directly from the pstld namespace without specifying the execution policy:
#include <cstdlib>
#include <iostream>
#define PSTLD_HEADER_ONLY // no prebuilt library, only the header
#include "pstld.h"
int main()
{
std::vector<unsigned> v(100'000'000);
std::generate(v.begin(), v.end(), [] { return std::rand(); });
std::cout << pstld::reduce(v.begin(), v.end()) << std::endl;
}
pstld can be precompiled via CMake or manually by adding pstld.cpp into the build sources:
#include <cstdlib>
#include <iostream>
#include "pstld.h"
int main()
{
std::vector<unsigned> v(100'000'000);
std::generate(v.begin(), v.end(), [] { return std::rand(); });
std::cout << pstld::reduce(v.begin(), v.end()) << std::endl;
}
% clang++ -std=c++17 main.cpp pstld.cpp -o test && ./test
64818392
Completeness
The library is not complete, this table shows which algorithms are currently available:
§ | Function | Exposed | Parallel :--- |:--- |:---:|:---: 25.6.1 | std::all_of | ✅ | ✅ 25.6.2 | std::any_of | ✅ | ✅ 25.6.3 | std::none_of | ✅ | ✅ 25.6.4 | std::for_each | ✅ | ✅ | | std::for_each_n | ✅ | ✅ 25.6.5 | std::find | ✅ | ✅ | | std::find_if | ✅ | ✅ | | std::find_if_not | ✅ | ✅ 25.6.6 | std::find_end | ✅ | ✅ 25.6.7 | std::find_first_of | ✅ | ✅ 25.6.8 | std::adjacent_find | ✅ | ✅ 25.6.9 | std::count | ✅ | ✅ | | std::count_if | ✅ | ✅ 25.6.10 | std::mismatch | ✅ | ✅ 25.6.11 | std::equal | ✅ | ✅ 25.6.13 | std::search | ✅ | ✅ | | std::search_n | ✅ | ✅ 25.7.1 | std::copy | ✅ | ✅ | | std::copy_n | ✅ | ✅ | | std::copy_if | ✅ | ❌ 25.7.2 | std::move | ✅ | ✅ 25.7.3 | std::swap_ranges | ✅ | ✅ 25.7.4 | std::transform | ✅ | ✅ 25.7.5 | std::replace | ✅ | ✅ | | std::replace_if | ✅ | ✅ | | std::replace_copy | ✅ | ❌ | | std::replace_copy_if | ✅ | ❌ 25.7.6 | std::fill | ✅ | ✅ | | std::fill_n | ✅ | ✅ 25.7.7 | std::generate | ✅ | ✅ | | std::generate_n | ✅ | ✅ 25.7.8 | std::remove | ❌ | ❌ | | std::remove_if | ❌ | ❌ | | std::remove_copy | ❌ | ❌ | | std::remove_copy_if | ❌ | ❌ 25.7.9 | std::unique | ❌ | ❌ | | std::unique_copy | ❌ | ❌ 25.7.10 | std::reverse | ✅ | ✅ | | std::reverse_copy | ❌ | ❌ 25.7.11 | std::rotate | ❌ | ❌ 25.7.14 | std::shift_left | ❌ | ❌ | | std::shift_right | ❌ | ❌ 25.8.2.1 | std::sort | ✅ | ✅ 25.8.2.2 | std::stable_sort | ✅ | ✅ 25.8.2.3 | std::partial_sort | ❌ | ❌ 25.8.2.4 | std::partial_sort_copy | ❌ | ❌ 25.8.2.5 | std::is_sorted | ✅ | ✅ | | std::is_sorted_until | ✅ | ✅ 25.8.3 | std::nth_element | ❌ | ❌ 25.8.5 | std::is_partitioned | ✅ | ✅ | | std::partition | ❌ | ❌ | | std::stable_partition | ❌ | ❌ | | std::partition_copy | ❌ | ❌ 25.8.6 | std::merge | ✅ | ✅ | | std::inplace_merge | ❌ | ❌ 25.8.7.2 | std::includes | ❌ | ❌ 25.8.7.3 | std::set_union | ❌ | ❌ 25.8.7.4 | std::set_intersection | ❌ | ❌ 25.8.7.5 | std::set_difference | ❌ | ❌ 25.8.7.6 | std::set_symmetric_difference | ❌ | ❌ 25.8.8.6 | std::is_heap | ❌ | ❌ | | std::is_heap_until | ❌ | ❌ 25.8.9 | std::min_element | ✅ | ✅ | | std::max_element | ✅ | ✅ | | std::minmax_element | ✅ | ✅ 25.8.11 | std::lexicographical_compare | ✅ | ✅ 25.10.4 | std::reduce | ✅ | ✅ 25.10.6 | std::transform_reduce | ✅ | ✅ 25.10.8 | std::exclusive_scan | ✅ | ✅ 25.10.9 | std::inclusive_scan | ✅ | ✅ 25.10.10 | std::transform_exclusive_scan | ✅ | ✅ 25.10.11 | std::transform_inclusive_scan | ✅ | ✅ 25.10.12 | std::adjacent_difference | ✅ | ✅ 25.11.3 | std::uninitialized_default_construct | ✅ | ✅ | | std::uninitialized_default_construct_n | ✅ | ✅ 25.11.4 | std::uninitialized_value_construct | ✅ | ✅ | | std::uninitialized_value_construct_n | ✅ | ✅ 25.11.5 | std::uninitialized_copy | ✅ | ✅ | | std::uninitialized_copy_n | ✅ | ✅ 25.11.6 | std::uninitialized_move | ✅ | ✅ | | std::uninitialized_move_n | ✅ | ✅ 25.11.7 | std::uninitialized_fill | ✅ | ✅ | | std::uninitialized_fill_n | ✅ | ✅ 25.11.9 | std::destroy | ✅ | ✅ | | std::destroy_n | ✅ | ✅
Parallel speedup
benchmark.cpp contains a set of primitive synthetic performance tests.
Each row shows how the parallel implementation compares to its serial counterpart depending on the number of elements in a working set.
The serial variant executes the default algorithms from libc++ and the parallel one runs the pstld implementation.
Per-element operations are mostly trivial in these benchmarks, so the speed-up numbers represent a somewhat worst-case scenario.
1K 10K 100K 1M 10M 100M
all_of 0.36 0.41 1.41 2.67 2.16 3.46
any_of 0.52 0.41 0.97 1.83 2.24 3.50
none_of 0.40 0.43 0.86 1.84 2.10 3.11
for_each 0.14 0.75 0.55 0.82 0.99 1.74
find 0.21 0.71 0.80 1.75 2.11 3.16
find_end 0.12 0.14 0.23 0.59 0.67 1.02
find_first_of 0.46 0.97 1.71 4.33 4.90 4.95
adjacent_find 0.46 0.50 0.64 1.59 2.03 2.23
count 0.16 0.64 0.93 0.78 0.93 2.24
mismatch 0.29 0.67 0.67 1.12 1.13 1.17
equal 0.48 0.78 0.75 1.21 1.24 1.29
search 0.54 0.69 1.39 3.25 3.79 3.98
copy 0.28 0.65 0.50 0.82 0.97 1.80
move 0.53 0.43 1.12 1.10 1.72 2.18
swap_ranges 0.32 0.66 0.57 0.64 0.97 1.51
transform 0.29 0.85 0.40 0.72 0.83 1.68
replace 0.16 0.31 0.32 1.02 0.98 1.73
fill 0.09 0.78 0.70 0.76 0.72 1.29
reverse 0.29 0.40 0.58 1.31 1.02 0.98
sort_Rnd 0.14 1.64 2.95 3.72 4.32 4.31
sort_Eq 0.07 0.42 0.59 0.94 0.98 0.94
sort_Asc 0.08 0.10 0.33 0.39 0.43 0.39
sort_Des 0.07 0.49 0.62 0.94 0.98 0.96
stable_sort 0.71 1.77 3.08 3.77 4.00 4.09
is_sorted 0.28 0.31 0.96 1.87 2.10 3.19
is_partitioned 0.24 0.68 0.95 1.84 2.10 3.05
merge 1.00 0.52 0.76 1.24 1.96 2.26
minmax_element 1.17 1.04 2.67 5.68 7.22 6.84
lexicographical_compare 0.65 1.02 1.30 2.07 2.14 2.22
reduce 1.13 2.01 1.56 4.28 5.57 5.25
transform_reduce 0.99 1.33 1.28 4.39 5.56 5.40
exclusive_scan 0.60 0.47 0.91 2.34 2.24 2.47
inclusive_scan 0.49 0.63 0.83 2.24 2.24 2.45
transform_exclusive_scan 0.45 0.48 1.06 2.30 2.29 2.33
transform_inclusive_scan 0.37 0.49 1.70 2.26 2.32 2.36
adjacent_difference 0.19 0.79 0.49 0.70 0.79 0.78
uninitialized_value_construct 0.26 0.24 0.40 0.97 1.86 1.89
uninitialized_copy 0.30 0.59 1.86 1.37 2.05 2.54
uninitialized_move 0.56 0.48 1.13 1.25 1.06 2.22
uninitialized_fill 0.76 0.93 3.21 4.94 5.21 5.11
destroy 0.33 0.81 1.68 1.90 1.93 4.13
</details>
<details>
<summary>2018 Mac Mini (Intel 8500B CPU, 6 cores), macOS 13, Xcode 14</summary>
1K 10K 100K 1M 10M 100M
all_of 0.26 0.64 3.28 3.00 3.24 4.10
any_of 0.50 0.67 0.93 2.66 3.07 3.60
none_of 0.36 0.63 0.61 2.06 2.30 2.85
for_each 0.17 0.47 0.47 0.97 0.99 1.22
find 0.25 0.48 0.89 2.34 2.43 3.10
find_end 0.09 0.24 0.42 1.01 1.16 1.36
find_first_of 0.58 1.26 2.80 3.99 4.82 5.09
adjacent_find 0.41 0.61 1.16 2.70 2.39 2.58
count 0.12 0.40 0.47 1.41 1.64 2.32
mismatch 0.36 0.27 0.66 1.55 1.60 1.72
equal 0.22 0.90 3.28 2.29 2.29 2.33
search 0.36 1.18 1.70 3.48 3.88 3.93
copy 0.12 0.33 0.67 0.98 1.13 1.68
move 0.20 0.43 1.29 0.89 1.11 4.15
swap_ranges 0.27 0.23 0.52 0.80 0.89 1.09
transform 0.22 0.41 0.62 0.88 1.03 1.49
replace 0.40 0.71 0.81 1.56 1.44 1.66
fill 0.14 0.49 0.87 1.18 0.97 1.13
reverse 0.31 0.49 0.90 1.88 1.00 0.96
sort_Rnd 0.28 1.92 3.16 3.44 3.76 3.66
sort_Eq
Related Skills
node-connect
338.0kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
83.4kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
338.0kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
commit-push-pr
83.4kCommit, push, and open a PR
