SkillAgentSearch skills...

Pstld

Experimental implementation of ParallelSTL on top of GCD aka libdispatch

Install / Use

/learn @mikekazakov/Pstld
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

Build Status

pstld

Experimental implementation of ParallelSTL on top of Grand Central Dispatch aka libdispatch

The purpose of this library is to provide a drop-in implementation of C++ parallel algorithms for the Apple platforms. Xcode comes with no parallel algorithms in libc++, so this library aims to fill the gap. pstld uses the native scheduler (libdispatch) and does not depend on any 3rd-party libraries.

Usage

The simplest way to use the library is to consume it as header-only and to request exposing the parallel algorithms in the namespace std:

#include <cstdlib>
#include <iostream>
#define PSTLD_HEADER_ONLY   // no prebuilt library, only the header
#define PSTLD_HACK_INTO_STD // export into namespace std
#include "pstld.h"
int main()
{
    std::vector<unsigned> v(100'000'000);
    std::generate(v.begin(), v.end(), [] { return std::rand(); });
    std::cout << std::reduce(std::execution::par, v.begin(), v.end()) << std::endl;
}

The default Xcode toolchain can now build it:

% clang++ -std=c++17 main.cpp -o test && ./test
64818392

The same functions can be used directly from the pstld namespace without specifying the execution policy:

#include <cstdlib>
#include <iostream>
#define PSTLD_HEADER_ONLY   // no prebuilt library, only the header
#include "pstld.h"
int main()
{
    std::vector<unsigned> v(100'000'000);
    std::generate(v.begin(), v.end(), [] { return std::rand(); });
    std::cout << pstld::reduce(v.begin(), v.end()) << std::endl;
}

pstld can be precompiled via CMake or manually by adding pstld.cpp into the build sources:

#include <cstdlib>
#include <iostream>
#include "pstld.h"
int main()
{
    std::vector<unsigned> v(100'000'000);
    std::generate(v.begin(), v.end(), [] { return std::rand(); });
    std::cout << pstld::reduce(v.begin(), v.end()) << std::endl;
}
% clang++ -std=c++17 main.cpp pstld.cpp -o test && ./test
64818392

Completeness

The library is not complete, this table shows which algorithms are currently available:

§ | Function | Exposed | Parallel :--- |:--- |:---:|:---: 25.6.1 | std::all_of | ✅ | ✅ 25.6.2 | std::any_of | ✅ | ✅ 25.6.3 | std::none_of | ✅ | ✅ 25.6.4 | std::for_each | ✅ | ✅ | | std::for_each_n | ✅ | ✅ 25.6.5 | std::find | ✅ | ✅ | | std::find_if | ✅ | ✅ | | std::find_if_not | ✅ | ✅ 25.6.6 | std::find_end | ✅ | ✅ 25.6.7 | std::find_first_of | ✅ | ✅ 25.6.8 | std::adjacent_find | ✅ | ✅ 25.6.9 | std::count | ✅ | ✅ | | std::count_if | ✅ | ✅ 25.6.10 | std::mismatch | ✅ | ✅ 25.6.11 | std::equal | ✅ | ✅ 25.6.13 | std::search | ✅ | ✅ | | std::search_n | ✅ | ✅ 25.7.1 | std::copy | ✅ | ✅ | | std::copy_n | ✅ | ✅ | | std::copy_if | ✅ | ❌ 25.7.2 | std::move | ✅ | ✅ 25.7.3 | std::swap_ranges | ✅ | ✅ 25.7.4 | std::transform | ✅ | ✅ 25.7.5 | std::replace | ✅ | ✅ | | std::replace_if | ✅ | ✅ | | std::replace_copy | ✅ | ❌ | | std::replace_copy_if | ✅ | ❌ 25.7.6 | std::fill | ✅ | ✅ | | std::fill_n | ✅ | ✅ 25.7.7 | std::generate | ✅ | ✅ | | std::generate_n | ✅ | ✅ 25.7.8 | std::remove | ❌ | ❌ | | std::remove_if | ❌ | ❌ | | std::remove_copy | ❌ | ❌ | | std::remove_copy_if | ❌ | ❌ 25.7.9 | std::unique | ❌ | ❌ | | std::unique_copy | ❌ | ❌ 25.7.10 | std::reverse | ✅ | ✅ | | std::reverse_copy | ❌ | ❌ 25.7.11 | std::rotate | ❌ | ❌ 25.7.14 | std::shift_left | ❌ | ❌ | | std::shift_right | ❌ | ❌ 25.8.2.1 | std::sort | ✅ | ✅ 25.8.2.2 | std::stable_sort | ✅ | ✅ 25.8.2.3 | std::partial_sort | ❌ | ❌ 25.8.2.4 | std::partial_sort_copy | ❌ | ❌ 25.8.2.5 | std::is_sorted | ✅ | ✅ | | std::is_sorted_until | ✅ | ✅ 25.8.3 | std::nth_element | ❌ | ❌ 25.8.5 | std::is_partitioned | ✅ | ✅ | | std::partition | ❌ | ❌ | | std::stable_partition | ❌ | ❌ | | std::partition_copy | ❌ | ❌ 25.8.6 | std::merge | ✅ | ✅ | | std::inplace_merge | ❌ | ❌ 25.8.7.2 | std::includes | ❌ | ❌ 25.8.7.3 | std::set_union | ❌ | ❌ 25.8.7.4 | std::set_intersection | ❌ | ❌ 25.8.7.5 | std::set_difference | ❌ | ❌ 25.8.7.6 | std::set_symmetric_difference | ❌ | ❌ 25.8.8.6 | std::is_heap | ❌ | ❌ | | std::is_heap_until | ❌ | ❌ 25.8.9 | std::min_element | ✅ | ✅ | | std::max_element | ✅ | ✅ | | std::minmax_element | ✅ | ✅ 25.8.11 | std::lexicographical_compare | ✅ | ✅ 25.10.4 | std::reduce | ✅ | ✅ 25.10.6 | std::transform_reduce | ✅ | ✅ 25.10.8 | std::exclusive_scan | ✅ | ✅ 25.10.9 | std::inclusive_scan | ✅ | ✅ 25.10.10 | std::transform_exclusive_scan | ✅ | ✅ 25.10.11 | std::transform_inclusive_scan | ✅ | ✅ 25.10.12 | std::adjacent_difference | ✅ | ✅ 25.11.3 | std::uninitialized_default_construct | ✅ | ✅ | | std::uninitialized_default_construct_n | ✅ | ✅ 25.11.4 | std::uninitialized_value_construct | ✅ | ✅ | | std::uninitialized_value_construct_n | ✅ | ✅ 25.11.5 | std::uninitialized_copy | ✅ | ✅ | | std::uninitialized_copy_n | ✅ | ✅ 25.11.6 | std::uninitialized_move | ✅ | ✅ | | std::uninitialized_move_n | ✅ | ✅ 25.11.7 | std::uninitialized_fill | ✅ | ✅ | | std::uninitialized_fill_n | ✅ | ✅ 25.11.9 | std::destroy | ✅ | ✅ | | std::destroy_n | ✅ | ✅

Parallel speedup

benchmark.cpp contains a set of primitive synthetic performance tests. Each row shows how the parallel implementation compares to its serial counterpart depending on the number of elements in a working set. The serial variant executes the default algorithms from libc++ and the parallel one runs the pstld implementation. Per-element operations are mostly trivial in these benchmarks, so the speed-up numbers represent a somewhat worst-case scenario.

<details> <summary>2020 MacBook Pro 13" (Apple M1 CPU, 4P+4E cores), macOS 13, Xcode 14</summary>
                                 1K   10K  100K    1M   10M  100M 
all_of                         0.36  0.41  1.41  2.67  2.16  3.46 
any_of                         0.52  0.41  0.97  1.83  2.24  3.50 
none_of                        0.40  0.43  0.86  1.84  2.10  3.11 
for_each                       0.14  0.75  0.55  0.82  0.99  1.74 
find                           0.21  0.71  0.80  1.75  2.11  3.16 
find_end                       0.12  0.14  0.23  0.59  0.67  1.02 
find_first_of                  0.46  0.97  1.71  4.33  4.90  4.95 
adjacent_find                  0.46  0.50  0.64  1.59  2.03  2.23 
count                          0.16  0.64  0.93  0.78  0.93  2.24 
mismatch                       0.29  0.67  0.67  1.12  1.13  1.17 
equal                          0.48  0.78  0.75  1.21  1.24  1.29 
search                         0.54  0.69  1.39  3.25  3.79  3.98 
copy                           0.28  0.65  0.50  0.82  0.97  1.80 
move                           0.53  0.43  1.12  1.10  1.72  2.18 
swap_ranges                    0.32  0.66  0.57  0.64  0.97  1.51 
transform                      0.29  0.85  0.40  0.72  0.83  1.68 
replace                        0.16  0.31  0.32  1.02  0.98  1.73 
fill                           0.09  0.78  0.70  0.76  0.72  1.29 
reverse                        0.29  0.40  0.58  1.31  1.02  0.98 
sort_Rnd                       0.14  1.64  2.95  3.72  4.32  4.31 
sort_Eq                        0.07  0.42  0.59  0.94  0.98  0.94 
sort_Asc                       0.08  0.10  0.33  0.39  0.43  0.39 
sort_Des                       0.07  0.49  0.62  0.94  0.98  0.96 
stable_sort                    0.71  1.77  3.08  3.77  4.00  4.09 
is_sorted                      0.28  0.31  0.96  1.87  2.10  3.19 
is_partitioned                 0.24  0.68  0.95  1.84  2.10  3.05 
merge                          1.00  0.52  0.76  1.24  1.96  2.26 
minmax_element                 1.17  1.04  2.67  5.68  7.22  6.84 
lexicographical_compare        0.65  1.02  1.30  2.07  2.14  2.22 
reduce                         1.13  2.01  1.56  4.28  5.57  5.25 
transform_reduce               0.99  1.33  1.28  4.39  5.56  5.40 
exclusive_scan                 0.60  0.47  0.91  2.34  2.24  2.47 
inclusive_scan                 0.49  0.63  0.83  2.24  2.24  2.45 
transform_exclusive_scan       0.45  0.48  1.06  2.30  2.29  2.33 
transform_inclusive_scan       0.37  0.49  1.70  2.26  2.32  2.36 
adjacent_difference            0.19  0.79  0.49  0.70  0.79  0.78 
uninitialized_value_construct  0.26  0.24  0.40  0.97  1.86  1.89 
uninitialized_copy             0.30  0.59  1.86  1.37  2.05  2.54 
uninitialized_move             0.56  0.48  1.13  1.25  1.06  2.22 
uninitialized_fill             0.76  0.93  3.21  4.94  5.21  5.11 
destroy                        0.33  0.81  1.68  1.90  1.93  4.13 
</details> <details> <summary>2018 Mac Mini (Intel 8500B CPU, 6 cores), macOS 13, Xcode 14</summary>
                                 1K   10K  100K    1M   10M  100M 
all_of                         0.26  0.64  3.28  3.00  3.24  4.10 
any_of                         0.50  0.67  0.93  2.66  3.07  3.60 
none_of                        0.36  0.63  0.61  2.06  2.30  2.85 
for_each                       0.17  0.47  0.47  0.97  0.99  1.22 
find                           0.25  0.48  0.89  2.34  2.43  3.10 
find_end                       0.09  0.24  0.42  1.01  1.16  1.36 
find_first_of                  0.58  1.26  2.80  3.99  4.82  5.09 
adjacent_find                  0.41  0.61  1.16  2.70  2.39  2.58 
count                          0.12  0.40  0.47  1.41  1.64  2.32 
mismatch                       0.36  0.27  0.66  1.55  1.60  1.72 
equal                          0.22  0.90  3.28  2.29  2.29  2.33 
search                         0.36  1.18  1.70  3.48  3.88  3.93 
copy                           0.12  0.33  0.67  0.98  1.13  1.68 
move                           0.20  0.43  1.29  0.89  1.11  4.15 
swap_ranges                    0.27  0.23  0.52  0.80  0.89  1.09 
transform                      0.22  0.41  0.62  0.88  1.03  1.49 
replace                        0.40  0.71  0.81  1.56  1.44  1.66 
fill                           0.14  0.49  0.87  1.18  0.97  1.13 
reverse                        0.31  0.49  0.90  1.88  1.00  0.96 
sort_Rnd                       0.28  1.92  3.16  3.44  3.76  3.66 
sort_Eq       

Related Skills

View on GitHub
GitHub Stars23
CategoryDevelopment
Updated5d ago
Forks3

Languages

C++

Security Score

90/100

Audited on Mar 21, 2026

No findings