Pmergesort

Parallel in-place/out-of-place merge sort algorithm implementations written in C language (mirror)

Generate Convert Improve

Install / Use

/learn @mcyril/Pmergesort

About this skill

Quality Score

0/100

README

BRIEF

Parallel in-place/out-of-place merge sort algorithm implementations written in C language

GOAL & SOLUTION

The aim was to implement stable and performant parallel sort algorithm with the minimal memory footprint. The merge sort is used as the base since it’s highly adaptive to a parallel processing. In-place mergesort based on SymMerge algorithm from Kim Pok-Son and Arne Kutzner, "Stable Minimum Storage Merging by Symmetric Comparisons", in Susanne Albers and Tomasz Radzik, editors, Algorithms - ESA 2004, volume 3221 of Lecture Notes in Computer Science, pages 714-723. Springer, 2004.

Current implementation is macOS-centric, thus it uses Apple GCD for multi-threading (if available), though it’s possible to use pthreads based multi-threading mechanism (basic Thread Pool implementation is included as well).

INTERFACE (defined in pmergesort.h)

symmergesort / symmergesort_r

In-place mergesort based on optimized SymMerge algorithm, might work as single threaded or parallel depending on configuration.

The prototype of regular symmergesort has function declaration similar to the standard library qsort function, so seamless replacement possible:

void symmergesort(void * base, size_t n, size_t sz,
                   int (*cmp)(const void *, const void *));

The prototype of reentrant symmergesort_r has function declaration similar to the standard library qsort_r function, so seamless replacement possible:

void symmergesort_r(void * base, size_t n, size_t sz, void * thunk,
                     int (*cmp)(void *, const void *, const void *));

pmergesort / pmergesort_r

Out-of-place merge sort, optimized naïve implementation, might work as single threaded or parallel depending on configuration. Implemented just out of curiosity.

The prototype of regular pmergesort has function declaration similar to the standard library mergesort function, so seamless replacement possible:

int pmergesort(void * base, size_t n, size_t sz,
                int (*cmp)(const void *, const void *));

The prototype of reentrant pmergesort_r has function declaration similar to the standard library qsort_r function with result, so seamless replacement possible:

int pmergesort_r(void * base, size_t n, size_t sz, void * thunk,
                  int (*cmp)(void *, const void *, const void *));

wrapmergesort / wrapmergesort_r

The parallel compound wrapper for generic sort, meaningless if works as single threaded. Implemented just out of curiosity.

The prototype of regular wrapmergesort has function declaration similar to the standard library qsort function with annex argument of wrapped sort function:

int wrapmergesort(void * base, size_t n, size_t sz,
                   int (*cmp)(const void *, const void *),
                    int (*sort)(void *, size_t, size_t,
                                 int (*)(const void *, const void *)));

The prototype of reentrant wrapmergesort_r has function declaration similar to the standard library qsort_r function with annex argument of wrapped sort function:

int wrapmergesort_r(void * base, size_t n, size_t sz, void * thunk,
                     int (*cmp)(void *, const void *, const void *),
                      int (*sort_r)(void *, size_t, size_t, void *,
                                     int (*)(void *, const void *, const void *)));

CONFIGURATION (see in pmergesort.c)

Configure algorithm parameters/settings using pre-processor directives (0 is ‘off’, 1 is ‘on’):

PMR_PARALLEL_USE_GCD
- enable use of GCD for multi-threading
- default is on
PMR_PARALLEL_USE_PTHREADS
- enable use of pthreads based pool for multi-threading
- default is off
CFG_PARALLEL_USE_OMP
- enable use of OpenMP® for multi-threading (experimental)
- default is off
PMR_RAW_ACCESS
- enable raw memory access
- off - implies the using of memmove and memcpy
- on - use raw memory access
- default is on
PMR_RAW_ACCESS_ALIGNED
- enable aligned raw memory access
- default is off

To enable the build of parallel merge sort algorithms set one of PMR_PARALLEL_USE_GCD, PMR_PARALLEL_USE_PTHREADS, or CFG_PARALLEL_USE_OMP on.

Configure intrinsic memory management and access overrides

PMR_MEMCPY
- memcpy override for non raw memory access
- default is memcpy
PMR_MEMMOVE
- memmove override for non raw memory access
- default is memmove
PMR_MALLOC
- malloc override
- default is malloc
PMR_REALLOC
- realloc override
- default is realloc
PMR_FREE
- free override
- default is free

Algorithms fine tuning and tweaks using pre-processor directives (see source code comments):

_PMR_QUEUE_OVERCOMMIT
_PMR_GCD_OVERCOMMIT
_PMR_PARALLEL_MAY_SPAWN
_PMR_PRESORT
_PMR_USE_4_MEM
_PMR_USE_8_MEM
_PMR_USE_16_MEM
_PMR_RAW_ACCESS_ALIGNED
_PMR_TMP_ROT
_PMR_MIN_SUBMERGELEN1
_PMR_MIN_SUBMERGELEN2
_PMR_BLOCKLEN_MTHRESHOLD0
_PMR_BLOCKLEN_MTHRESHOLD
_PMR_BLOCKLEN_SYMMERGE
_PMR_BLOCKLEN_MERGE

SUPPORTED PLATFORMS

Source code has been built and tested on macOS (10.5 - 10.15) only.

Future potential:

Can be quite easily made POSIX compatible, but not tested that yet
Byte order independent and should be compatible with any CPU architecture, but not tested with big-endian yet

BUILD

clang -mmacosx-version-min=10.6 -c pmergesort.c -Ofast -o pmergesort.o


pgcc -fast -O4 -Mipa=fast,inline,force -DMAC_OS_X_VERSION_MIN_REQUIRED=1060 -DTARGET_CPU_X86_64=1 -DTARGET_OS_MAC=1 -c pmergesort.c -o pmergesort.o


icc -O3 -std=c99 -c pmergesort.c -o pmergesort.o

TODO: makefile

PERFORMANCE

Depends on CPU power/number of cores

Ironically, on practice the single threaded pmergesort is faster and with twice lesser memory footprint than standard macOS mergesort

Time complexity:

symmergesort performance (as per single threaded algorithm) is better than O(N*log(N)^2)
pmergesort performance (as per single threaded algorithm) is about standard O(N*log(N))

Memory footprint:

symmergesort is O(1)
pmergesort is O(N/2) in worst case

Related Skills

node-connect

352.5k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

111.3k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

352.5k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

qqbot-media

352.5k

QQBot 富媒体收发能力。使用 <qqmedia> 标签，系统根据文件扩展名自动识别类型（图片/语音/视频/文件）。