Timsort
Port of Java's TimSort to C
Install / Use
/learn @patperry/TimsortREADME
This is a fairly straightforward C99 port of the Java/Android implementation of Timsort, which itself is a port of Tim Peters's list sort for Python. Most of the comments are from the earlier two implementations, except those annotated by the initials POP.
The prototype for the sorting function is
int timsort(void *base, size_t nel, size_t width,
int (*compar) (const void *, const void *));
This resembles the standard library qsort function. The function
returns 0 on success and -1 on failure. Failure only happens when
the routine can't allocate the necessary temporary array or if the
comparator is invalid.
The source code for the (Apache 2.0 Licensed) Java version is included in the JSR-166 proposal.
This version would not be possiple without all the hard work on the Java version. Kudos to google for open-sourcing their code. This implementation has same license, and maintains the "Android Open Source Project" copyright on the code. An even bigger "thank you" goes to Tim Peters, for devising and documenting the algorithm in the first place.
Differences from the Java version
C doesn't have exceptions, so we use errno and return codes instead. Another
difference is that C has pointers, so there is no need to pass around a lo
parameter to specify a range of values in an array.
One major change is that we use 'size_t' instead of 'int' as the index type; this
required some extra care in checking for overflow and ensuring that
the values are always non-negative. Moreover, some modifications needed to
be made to allow for differences on 64-bit architectures. These changes
are explained in more detail below.
Use return codes instead of exceptions
In the C implementation, if malloc or realloc fails to allocate the
necessary temporary array, the sorting routine returns -1.
Likewise, if mergeLo or mergeHi discovers that the comparison
method violates its contract, then it sets errno to EINVAL and
the sorting routine returns -1.
Use pointers instead of arrays with offsets
C has pointers, so there is no need to pass around "lo" parameters. Instead, we store the pointer to the base of the array range.
Switch from int to size_t
The code takes special care to ensure that integer values never become negative; in particular:
-
In the Java version of
gallopLeft, thelastOfsvariable is allowed to become-1. To prevent this, in theelsebranch we replacelastOfs = hint - ofs;with
lastOfs = hint + 1 - ofs;for symmetry, in the
ifbranch we replacelastOfs += hintwithlastOfs += hint + 1. Finally, we replaceassert(-1 <= lastOfs && lastOfs < ofs && ofs <= len);with
assert(0 <= lastOfs && lastOfs <= ofs && ofs <= len);and remove the statement
lastOfs++;which happens after the conditional. Similar changes appear ingallopRight. -
In
mergeLo, we replaceminGallop--;with
if (minGallop > 0) minGallop--;Then, we remove following lines, which appear after the loop:
if (minGallop < 0) minGallop = 0;Similar changes appear in
mergeHi.
The rest of the changes allow for size_t to be up to 64-bits long:
-
In the TimSort constructor (
timsort_init), the linestackLen = (len < 120 ? 5 : len < 1542 ? 10 : len < 119151 ? 19 : 40);gets replaced with with
stackLen = (len < 359 ? 5 : len < 4220 ? 10 : len < 76210 ? 16 : len < 4885703256 ? 39 : 85);Note that the bounds on
lenare slightly different. The justification for these values is given in a comment above the assignment. -
In
ensureCapacity, we add an additional operation:if (sizeof(newSize) > 4) newSize |= newSize >> 32;
Related projects
Christopher Swenson has a macro-based C implementation of timsort at https://github.com/swenson/sort/.
Performance
In a crude benchmark, sorting small, medium, and large arrays of
uniformly chosen 64-bit integers, performance was comparable to the
stdlib qsort implementation; Swenson's macro-based timsort
implementation was about 1.4 times faster than both. Swenson's
implementation has the edge here because it avoids the overhead
of a function call when comparing two elements.
Reentrant version
A reentrant variation of the timsort function - timsort_r - has been
defined which can pass an additional argument to the comparison function
purely for the calling application's own use. This is similar to the GNU
and C11 variations of the standard qsort function (not Windows
qsort_s or BSD qsort_r).
The prototype for this sorting function is
int timsort_r(void *base, size_t nel, size_t width,
int (*compar) (const void *, const void *, void *),
void *context);
The additional context parameter is passed to the comparison callback as the last argument.
Related Skills
node-connect
339.1kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
83.8kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
339.1kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
commit-push-pr
83.8kCommit, push, and open a PR
