Pmergesort
Parallel in-place/out-of-place merge sort algorithm implementations written in C language (mirror)
Install / Use
/learn @mcyril/PmergesortREADME
BRIEF
Parallel in-place/out-of-place merge sort algorithm implementations written in C language
GOAL & SOLUTION
The aim was to implement stable and performant parallel sort algorithm with the minimal memory footprint. The merge sort is used as the base since it’s highly adaptive to a parallel processing. In-place mergesort based on SymMerge algorithm from Kim Pok-Son and Arne Kutzner, "Stable Minimum Storage Merging by Symmetric Comparisons", in Susanne Albers and Tomasz Radzik, editors, Algorithms - ESA 2004, volume 3221 of Lecture Notes in Computer Science, pages 714-723. Springer, 2004.
Current implementation is macOS-centric, thus it uses Apple GCD for multi-threading (if available), though it’s possible to use pthreads based multi-threading mechanism (basic Thread Pool implementation is included as well).
INTERFACE (defined in pmergesort.h)
symmergesort / symmergesort_r
In-place mergesort based on optimized SymMerge algorithm, might work as single threaded or parallel depending on configuration.
The prototype of regular symmergesort has function declaration similar to the standard library qsort function, so seamless replacement possible:
void symmergesort(void * base, size_t n, size_t sz,
int (*cmp)(const void *, const void *));
The prototype of reentrant symmergesort_r has function declaration similar to the standard library qsort_r function, so seamless replacement possible:
void symmergesort_r(void * base, size_t n, size_t sz, void * thunk,
int (*cmp)(void *, const void *, const void *));
pmergesort / pmergesort_r
Out-of-place merge sort, optimized naïve implementation, might work as single threaded or parallel depending on configuration. Implemented just out of curiosity.
The prototype of regular pmergesort has function declaration similar to the standard library mergesort function, so seamless replacement possible:
int pmergesort(void * base, size_t n, size_t sz,
int (*cmp)(const void *, const void *));
The prototype of reentrant pmergesort_r has function declaration similar to the standard library qsort_r function with result, so seamless replacement possible:
int pmergesort_r(void * base, size_t n, size_t sz, void * thunk,
int (*cmp)(void *, const void *, const void *));
wrapmergesort / wrapmergesort_r
The parallel compound wrapper for generic sort, meaningless if works as single threaded. Implemented just out of curiosity.
The prototype of regular wrapmergesort has function declaration similar to the standard library qsort function with annex argument of wrapped sort function:
int wrapmergesort(void * base, size_t n, size_t sz,
int (*cmp)(const void *, const void *),
int (*sort)(void *, size_t, size_t,
int (*)(const void *, const void *)));
The prototype of reentrant wrapmergesort_r has function declaration similar to the standard library qsort_r function with annex argument of wrapped sort function:
int wrapmergesort_r(void * base, size_t n, size_t sz, void * thunk,
int (*cmp)(void *, const void *, const void *),
int (*sort_r)(void *, size_t, size_t, void *,
int (*)(void *, const void *, const void *)));
CONFIGURATION (see in pmergesort.c)
Configure algorithm parameters/settings using pre-processor directives (0 is ‘off’, 1 is ‘on’):
- PMR_PARALLEL_USE_GCD
- enable use of GCD for multi-threading
- default is on
- PMR_PARALLEL_USE_PTHREADS
- enable use of pthreads based pool for multi-threading
- default is off
- CFG_PARALLEL_USE_OMP
- enable use of OpenMP® for multi-threading (experimental)
- default is off
- PMR_RAW_ACCESS
- enable raw memory access
- off - implies the using of memmove and memcpy
- on - use raw memory access
- default is on
- PMR_RAW_ACCESS_ALIGNED
- enable aligned raw memory access
- default is off
To enable the build of parallel merge sort algorithms set one of PMR_PARALLEL_USE_GCD, PMR_PARALLEL_USE_PTHREADS, or CFG_PARALLEL_USE_OMP on.
Configure intrinsic memory management and access overrides
- PMR_MEMCPY
- memcpy override for non raw memory access
- default is memcpy
- PMR_MEMMOVE
- memmove override for non raw memory access
- default is memmove
- PMR_MALLOC
- malloc override
- default is malloc
- PMR_REALLOC
- realloc override
- default is realloc
- PMR_FREE
- free override
- default is free
Algorithms fine tuning and tweaks using pre-processor directives (see source code comments):
- _PMR_QUEUE_OVERCOMMIT
- _PMR_GCD_OVERCOMMIT
- _PMR_PARALLEL_MAY_SPAWN
- _PMR_PRESORT
- _PMR_USE_4_MEM
- _PMR_USE_8_MEM
- _PMR_USE_16_MEM
- _PMR_RAW_ACCESS_ALIGNED
- _PMR_TMP_ROT
- _PMR_MIN_SUBMERGELEN1
- _PMR_MIN_SUBMERGELEN2
- _PMR_BLOCKLEN_MTHRESHOLD0
- _PMR_BLOCKLEN_MTHRESHOLD
- _PMR_BLOCKLEN_SYMMERGE
- _PMR_BLOCKLEN_MERGE
SUPPORTED PLATFORMS
Source code has been built and tested on macOS (10.5 - 10.15) only.
Future potential:
- Can be quite easily made POSIX compatible, but not tested that yet
- Byte order independent and should be compatible with any CPU architecture, but not tested with big-endian yet
BUILD
clang -mmacosx-version-min=10.6 -c pmergesort.c -Ofast -o pmergesort.o
pgcc -fast -O4 -Mipa=fast,inline,force -DMAC_OS_X_VERSION_MIN_REQUIRED=1060 -DTARGET_CPU_X86_64=1 -DTARGET_OS_MAC=1 -c pmergesort.c -o pmergesort.o
icc -O3 -std=c99 -c pmergesort.c -o pmergesort.o
TODO: makefile
PERFORMANCE
Depends on CPU power/number of cores
Ironically, on practice the single threaded pmergesort is faster and with twice lesser memory footprint than standard macOS mergesort
Time complexity:
- symmergesort performance (as per single threaded algorithm) is better than O(N*log(N)^2)
- pmergesort performance (as per single threaded algorithm) is about standard O(N*log(N))
Memory footprint:
- symmergesort is O(1)
- pmergesort is O(N/2) in worst case
Related Skills
node-connect
352.5kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
111.3kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
352.5kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
352.5kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
