SkillAgentSearch skills...

Miniselect

Selection and partial sorting algorithms

Install / Use

/learn @danlark1/Miniselect
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

Build Status License

miniselect: Generic selection and partial ordering algorithms

miniselect is a C++ header-only library that contains various generic selection and partial sorting algorithms with the ease of use, testing, advice on usage and benchmarking.

Sorting is everywhere and there are many outstanding sorting algorithms that compete in speed, comparison count and cache friendliness. However, selection algorithms are always a bit outside of the competition scope, they are pretty important, for example, in databases ORDER BY LIMIT N is used extremely often which can benefit from more optimal selection and partial sorting algorithms. This library tries to solve this problem with Modern C++.

  • Easy: First-class, easy to use dependency and carefully documented APIs and algorithm properties.
  • Fast: We do care about speed of the algorithms and provide reasonable implementations.
  • Standard compliant: We provide C++11 compatible APIs that are compliant to the standard std::nth_element and std::partial_sort functions including custom comparators and order guarantees. Just replace the names of the functions in your project and it should work!
  • Well tested: We test all algorithms with a unified framework, under sanitizers and fuzzing.
  • Benchmarked: We gather benchmarks for all implementations to better understand good and bad spots.

Table of Contents

Quick Start

You can either include this project as a cmake dependency and then use the headers that are provided in the include folder or just pass the include folder to your compiler.

#include <iostream>
#include <vector>

#include "miniselect/median_of_ninthers.h"

int main() {
  std::vector<int> v = {1, 8, 4, 3, 2, 9, 0, 7, 6, 5};
  miniselect::median_of_ninthers_select(v.begin(), v.begin() + 5, v.end());
  for (const int i : v) {
    std::cout << i << ' ';
  }
  return 0;
}
// Compile it `clang++/g++ -I$DIRECTORY/miniselect/include/ example.cpp -std=c++11 -O3 -o example
// Possible output: 0 1 4 3 2 5 8 7 6 9
//                            ^ on the right place

Examples can be found in examples.

We support all compilers starting from GCC 7 and Clang 6. We are also planning to support Windows, for now it is best effort but no issues are known so far.

More on which algorithms are available, see documentation. For overview of this work you can read the article in the author's blog.

Testing

To test and benchmark, we use Google benchmark library. Simply do in the root directory:

# Check out the libraries.
$ git clone https://github.com/google/benchmark.git
$ git clone https://github.com/google/googletest.git
$ mkdir build && cd build
$ cmake -DMINISELECT_TESTING=on -DBENCHMARK_ENABLE_GTEST_TESTS=off -DBENCHMARK_ENABLE_TESTING=off ..
$ make -j
$ ctest -j4 --output-on-failure

It will create two tests and two benchmarks test_sort, test_select, benchmark_sort, benchmark_select. Use them to validate or contribute. You can also use ctest.

Documentation

There are several selection algorithms available, further $n$ is the number of elements in the array, $k$ is the selection element that is needed to be found (all algorithms are deterministic and not stable unless otherwise is specified):

| Name | Average | Best Case | Worst Case | Comparisons | Memory | |------------------------- |--------------------------------------------------------------------------------------------------------- |--------------------------------------------------------------------------------------------------------- |----------------------------------------------------------------------------------------------------------------------- |---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |--------------------------------------------------------------------------------------------------------------------------------- | | pdqselect | $O(n)$ | $O(n)$ | $O(n\log n)$ | At least $2n$. Random data $2.5n$ | $O(1)$ | | Floyd-Rivest | $O(n)$ | $O(n)$ | $O(n^2)$ | Avg: $n + \min(k, n - k) + O(\sqrt{n \log n})$ | $O(\log\log n)$ | | Median Of Medians | $O(n)$ | $O(n)$ | $O(n)$ | Between $2n$ and $22n$. Random data $2.5n$ | $O(\log n)$ | | Median Of Ninthers | $O(n)$ | $O(n)$ | $O(n)$ | Between $2n$ and $21n$. Random data $2n$ | $O(\log n)$ | | Median Of 3 Random | $O(n)$ | $O(n)$ | $O(n^2)$ | At least $2n$. Random data $3n$ | $O(\log n)$ | | HeapSelect | $O(n\log k)$ | $O(n)$ | $O(n\log k)$ | $n\log k$ on average, for some data patterns might be better | $O(1)$ | | libstdc++ (introselect) | $O(n)$ | $O(n)$ | $O(n\log n)$ | At least $2n$. Random data $3n$ | $O(1)$ | | libc++ (median of 3) | $O(n)$ | $O(n)$ | $O(n^2)$ | At least $2n$. Random data $3n$ | $O(1)$ |

For sorting the situation is similar except every line adds $O(k\log k)$ comparisons and pdqselect is using $O(\log n)$ memory.

API

All functions end either in select, either in partial_sort and their behavior is exactly the same as for std::nth_element and std::partial_sort respectively, i.e. they accept 3 arguments as first, middle, end iterators and an optional comparator. Several notes:

  • You should not throw exceptions from Compare function. Standard library also does not specify the behavior in that matter.
  • We don't support ParallelSTL for now.
  • C++20 constexpr specifiers might be added but currently we don't have them because of some floating point math in several algorithms.
  • All functions are in the miniselect namespace. See the example for that.
  • pdqselect
    • This algorithm is based on pdqsort which is acknowledged as one of the fastest generic sort algorithms.
    • Location: miniselect/pdqselect.h.
    • Functions: pdqselect, pdqselect_branchless, pdqpartial_sort, pdqpartial_sort_branchless. Branchless version uses branchless partition algorithm provided by pdqsort. Use it if your comparison function is branchless, it might give performance for very big ranges.
    • Performance advice: Use it when you need to sort a big chunk so that $k$ is close to $n$.
<p align="center"><img src="https://media.giphy.com/media/TXIm9rTmbmox5ceSyP/giphy.gif" /></p>
View on GitHub
GitHub Stars140
CategoryDevelopment
Updated2mo ago
Forks8

Languages

C++

Security Score

100/100

Audited on Jan 22, 2026

No findings