SkillAgentSearch skills...

SimdQSort

Quicksort Algorithm implemented using the AVX2 instruction set

Install / Use

/learn @mikerabat/SimdQSort
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

SIMD Quicksort

This project implements a SIMD Quicksort algorithm that is AVX2 based. It expects input arrays that consists either of int32, int64, single (FP32) and double (FP64) elements.

The project is working with Delphi (2010 and up) and FPC in both intel flavors - x64 and x86. It is based on the work of "WojciechMula" who implemented various approaches on the SIMD Quicksort approach for int32 arrays. Check out https://github.com/WojciechMula/simd-sort for the C - version.

The algorithm here uses a left, right bound and within that bound it has 4 read write pointers.

wh--rh------------------rt--wt-

wh = write head rh = read head rt = read tail wt = write tail

The larger distance (rh - wh) (rt - wt) is used to read the next vector - this one is compared against the pivot and permuted according to the shuffle masks - then written back if a full vector has been found.

Reading from the shuffle mask is a crucial step here so the list is packed (bytes) and read via the vpmovzxbd command.

I left out the AVX512 implementations since I do not have a cpu that implements these opcodes. Newer (mobile) cpu's like the Intel ULtra 7 155H focus more on AI extensions than AVX512.

Usage

To use the code one needs to add the AVXQuickSortxy.pas to the project and add source search path to the library path so the dependencies are loaded too. This enables AVX encoding/decoding in case the AVX2 instruction set is avaliabe.

The test directory holds two projects:

  • TestSort.dpr - A small unit test tool to check if the sorting is correct
  • Benchmark.dpr - Benchmarking the AVX implementation against the standard implementation.

The projects work for Delphi versions 2010 up to Delphi 12.2 (most likely 12.3 too) as well as FPC.

Stats

The files were tested against the standard Delphi implementation which can be found in Quicksort.pas I created a small benchmark app that shows a speedup for random data of about 1.5 to 3.5 than the standard implementation.

Design considerations

Unfortunately Delphi does not really support the fancy intrinsics so the code needed to be pure asm and replicated.

The project could use global constants but the disassembler could not handle that well so there is another variable passed to the en/decoding routines that hold a pointer to the constants.

I basically used the disassembly routines from https://github.com/mikerabat/mrmath to get the equivalent "db instructions" for the AVX set that are missing up until Delphi 11.3.

Related Skills

View on GitHub
GitHub Stars10
CategoryDevelopment
Updated11mo ago
Forks0

Languages

Pascal

Security Score

82/100

Audited on May 7, 2025

No findings