Bitmasksorter
BitMask Sorter
Install / Use
/learn @aldo-gutierrez/BitmasksorterREADME
Bit Mask Sorters. Up to 18x times faster Sorters
This project explores various sorting algorithms employing a BitMask optimization approach.
The Java implementation can be found in this repository.
[Java Version] (https://github.com/aldo-gutierrez/bitmasksorter)
For implementations in other languages, please refer to the following repositories:
[C# Version] (https://github.com/aldo-gutierrez/bitmasksorterCSharp)
[C++ Version] (https://github.com/aldo-gutierrez/bitmasksorterCpp)
[Javascript Version] (https://github.com/aldo-gutierrez/bitmasksorterJS)
[Python Version] (https://github.com/aldo-gutierrez/bitmasksorterPython)
BitMask
A BitMask can be utilized to reduce the computational overhead required for sorting.
The following code demonstrates the calculation of the BitMask:
public static int calculateBitMask(int[] array, int start, int endP1) {
int mask = 0x00000000;
int inv_mask = 0x00000000;
for (int i = start; i < endP1; i++) {
int ei = array[i];
mask = mask | ei;
inv_mask = inv_mask | ~ei;
}
return mask & inv_mask;
}
Case 1 Simple BitMask
Some properties of this BitMask are:
- This BitMask has only the bits that change
- If the BitMask is 0 it means that all the numbers are equal
First, let's consider an example: suppose we have a list of integers containing numbers from 0 to 127. In this case, the bitmask would be 1111111.
| | Number | Bits | |-----------------|-------:|-----------------:| | Min: | 0 | 0000000000000000 | | Max: | 127 | 0000000001111111 | | Average/Median: | 64 | 0000000001000000 | | Bits Used (b): | 7 | 0000000001111111 |
this BitMask can be used to optimize sorting algorithms.
- QuickSort:
- Now is easy to know which pivot to use, just use the average, in this case 64 (1000000)
- Each quick sort call is using a bit of information
- Radix sort:
- A typical LSD Radix sorts for integers (32 bits) uses 4 count sort passes, it could just use one pass for example in the case that the range is 0-127 and the mask is 1111111
- Count sort/ Pingeon count sort
- We have the Min/Max values in the BitMask
Case 2 Complex BitMask
Second, let's consider a more complex example: suppose we have a list of integers containing numbers 85, 84, 21 and 5. In this case, the bitmask would be 1010001. As only the bits that change are part of the mask, 3 bits in this case.
| Number | Bits | |--------------:|-----------------:| | 85 | 0000000001010101 | | 84 | 0000000001010100 | | 21 | 0000000000010101 | | 5 | 0000000000000101 | | MASK | 0000000001010001 |
| | Number | Bits | |----------------------|-------:|-----------------------:| | Mask Min: | 4 | 000000000(0)0(0)010(0) | | Mask Max: | 85 | 000000000(1)0(1)010(1) | | Mask Average/Median: | 64 | 000000000(1)0(0)000(0) | | Mask Bits Used (d): | 3 | 000000000(1)0(1)000(1) |
- QuickSort:
- Easy to know which pivot use just use the average
- Each quick sort call is using a bit of information
- Radix sort:
- We will need a modified Radix Sort that uses only the 3 bits in one pass
- Count sort/ Pingeon count sort
- We have already de Min/Max values in the BitMask
- However, we could improve the efficiency by only considering the ones in the mask, 2^3 values, 000(0) to 111(7), 8 values in total
Case 3 Bad BitMasks
Third, let's consider some problems with the BitMask
Example 3A: If a list contains only two numbers, values: 1111111110000000(65408) and 0000000111111111(511). For this case the mask is 1111111001111111, instead of just 1 bit we have many, so it doesn't help too much at reducing computation. After some step in the sort algorithm it could be a good idea to recalculate the BitMask to see if it is 0 to stop
Example 3B: If a list contains positive(+1) and negative numbers(-1) then the mask will have 1 bits starting at -1 1: 0000000000000001 -1: 1111111111111111
MASK: 1111111111111110
So the mask won't help to reduce computations in this case if we put the positives to one side, negatives to the other we could recalculate the BitMask and get a more useful BitMask.
Example 3C: If the BitMask is 0xFFFFFFFF (0b1111111111111111111111111111111) because the numbers in the list are using the full range of Integer, then the BitMask doesn't help to optimize a sort algorithm.
Hybrid Approach
There are many algorithms that use a hybrid approach for example TimSort.
TimSort is a hybrid sorting algorithm that employs merge sort for large arrays of objects and insertion sort for small arrays, adjusting its strategy based on the array size (N).
I think we can use the BitMask to create a hybrid algorithm that takes into account not only (number of elements) N but also r (the range). We could use the BitMask for calculating the range, the range could be reduced if we only use the bits set to one as described in previous paragraphs.
AGSelectorSorterInt (Algorithm Selector) is an hybrid algorithm that choose between different algorithms that uses BitMasks
QuickBitSorter
Is similar to QuickSort but for choosing the pivot it uses the BitMask. The first bit set to one of the BitMask is used as pivot In the next recursion the next bit of the bitmask is used. This is different to other QuickSort algorithms that normally use the last element as pivot or choose the average of three pivots.
This is somehow similar to Binary MSD radix sort, but instead of choosing a digit(word) or byte a bit is used More comparisons and study is need to understand the differences in performance and implementation.
This algorithm is slower than TimSort but faster than a Stack Overflow QuickSort samples in general. However, when the number of bits is low is faster than Java TimSort.
If the bitmask is 11_1111_1111_1111_1111 then the first partition uses the number 10_0000_0000_0000_0000 as pivot, that process is repeated recursively in the left and right partitions, after some iterations and when there is less than a number of bits in the remaining BitMask we switch to a Java Sort or to a Count Sort Algorithm (Pigeonhole Sort or two types of Radix Sorter with bits)
Fo the Bad case 3A: If a list contains only two numbers, values: 1111111110000000(65408) and 0000000111111111(511), in the second iteration we detect that all elements go to just one partition, so we recalculate the mask, and in this case it will produce a bitMask with value 0 so the sort is complete.
Optimizations:
- The partition is only done by the bits set to one in the BitMask as described above
- When the remaining bits in the BitMask are small it switches to a Java Sort or to a Count Sort Algorithm (Pigeonhole Sort or two types of Radix Sorter with bits)
- Multithreading support
See an analysis of the performance in the section Speed and Performance
Usage:
//Sorting int numbers
SorterInt sorter = new QuickBitSorterInt();
int[] list = ....
sorter.sort(list);
//Parallel sorting int numbers
SorterInt sorter = new QuickBitSorterMTInt();
int[] list = ....
sorter.sort(list);
//Sorting unsinged int numbers
SorterInt sorter = new QuickBitSorterInt();
FieldOptions options = new FieldOptions() {
@Override
public FieldType getFieldType() {
return FieldType.UNSIGNED_INTEGER;
}
};
int[] list = ....
sorter.sort(list, options);
RadixBitSorter:
RadixBitSorter is a sorter that uses the BitMask to sort with just the bits that need to be sorted. The BitMask is used to reduce the number of count sort iterations.
Some implementations of Radix Sort are very unoptimized stackoverflow, baeldung, hackerearth have a lot of this. They use sometimes base 10 and sometimes bucket sort, compared to those RadixBitSorter is way faster.
Some optimized implementations use base 256 using bit operations. A typical optimized Radix Sorter for integers (32 bits) could use 8bits (words/digits) and needs 4 steps of Count Sort.
After a lot of testing I found that in my machine (AMD Ryzen 7 4800H) 11bits is the best word size and I also found that in some older computers dual core 8bits is the best size. That was found many years ago as I later discovered. So RadixBiSorter change the size of the words according to the number of cores that are somehow correlated with the Cache Sizes.
Some other optimizations like swapping the aux buffer with the array buffer are also implemented but with mixed results.
A typical optimized Radix Sorter for integers (32 bits) could use 8bits (words/digits) and needs 4 iterations of Count Sort, with the BitMask we can reduce the iterations to 1 in some cases.
Ska Sort as I understand is an optimized C++ radix sort, It uses American Flag sort but one Byte a time is an In place sort but not stable.
RadixBitSorter is not in place as it uses and aux buffer, and is a stable sorter
RadixBitSorter is typically faster than SkaSort, but when the bitmask is of size 30-32 bits then it can have similar or a little lower performance than SkaSort.
Radix Bit Sorter Optimizations:
- It does not need to sort by all the bits just the ones in the bitmask
- It sorts by 11 bits at a time as maximum and 4 as minimum (Instead of 8 as SkaSort and RadixByteSorterInt do)
RadixByteSorter:
Is a RadixBitSorter but that always uses 8bits with byte boundaries, it can or not use the BitMask to optimize the iterations of count sort
RadixBitSorter is faster than RadixByteSorter and SkaSort in most of the cases and machines i tested.
Radix Byte Sorter Optimizations:
- An option to use the bitmask and detect which bytes do not use has been added to the algorithm (calculateBitMaskOptimization) and enable by default.
Usage:
//Sorting unsigned int numbers
IntSorter sort
