Parsort
Parallel Sorting for Go
Install / Use
/learn @rah-0/ParsortREADME
(par)allel (sort)ing
A minimal, dependency-free Go library for parallel sorting of slices. Designed to scale across CPU cores and stay close to idiomatic Go.
Installation
go get github.com/rah-0/parsort
Algorithm
The algorithm built is a parallel merge sort, more specifically a multiway parallel merge sort with pairwise merging.
Breakdown:
- Divide phase: The input slice is split into
Nchunks (N = NumCPU). - Parallel sort phase: Each chunk is sorted independently using
sort.Slicein parallel goroutines. - Merge phase: All sorted chunks are merged in parallel pairwise steps (
log₂N steps total).
This approach isn't a classic recursive merge sort — instead, it's:
- Iterative, not recursive.
- Uses parallelism for both sorting and merging.
- Merge is pairwise, not
N-way(N-way can be faster for large chunk sets but more complex).
Why not use generics?
- Generics introduce performance overhead
- Generic
sort.Slicerequires a comparison function (e.g.,func(i, j int) bool). - This introduces:
- Function call overhead for every comparison.
- Indirect branching, which harms CPU branch prediction and cache locality.
- Native sorts like
sort.Ints,sort.Float64s, etc. are heavily optimized and compiled with inlined, tight loops and direct comparisons.
- Generic
- Avoiding heap allocations
- Generic code may cause additional heap allocations (captured closures, interfaces).
- Currently, allocate only when merging, minimizing GC pressure.
- Use case is performance-focused
- Generic versions are
1.5x–2xslower than concrete ones. - Each type (
int,float64,string, etc.) gets the best possible performance. - Since this a sorting library, it's worth specializing.
- Generic versions are
Performance
Overall, parsort can reduce ns/op by up to 90% but at the expense of around 3 times more memory.
Int
| Size | Order | ns/op (%) | B/op (%) | |------------|-------|-----------|----------| | 10000 | Asc | -7.09 | +301.81 | | 10000 | Desc | -59.20 | +301.63 | | 100000 | Asc | -35.48 | +302.22 | | 100000 | Desc | -70.23 | +302.17 | | 1000000 | Asc | -72.71 | +300.43 | | 1000000 | Desc | -87.76 | +300.43 | | 10000000 | Asc | -74.58 | +300.02 | | 10000000 | Desc | -88.51 | +300.02 |
Float
| Size | Order | ns/op (%) | B/op (%) | |------------|-------|-----------|----------| | 10000 | Asc | -22.90 | +301.82 | | 10000 | Desc | -59.62 | +301.63 | | 100000 | Asc | -52.34 | +302.23 | | 100000 | Desc | -75.61 | +302.20 | | 1000000 | Asc | -73.67 | +300.43 | | 1000000 | Desc | -86.61 | +300.43 | | 10000000 | Asc | -76.69 | +300.02 | | 10000000 | Desc | -88.17 | +300.02 |
String
| Size | Order | ns/op (%) | B/op (%) | |------------|-------|-----------|----------| | 10000 | Asc | -21.91 | +300.92 | | 10000 | Desc | -44.02 | +300.83 | | 100000 | Asc | -48.78 | +300.09 | | 100000 | Desc | -63.57 | +300.08 | | 1000000 | Asc | -52.03 | +300.11 | | 1000000 | Desc | -66.70 | +300.08 | | 10000000 | Asc | -57.81 | +300.00 | | 10000000 | Desc | -72.57 | +300.00 |
time.Time
| Size | Order | ns/op (%) | B/op (%) | |------------|----------|-----------|----------| | 10000 | Asc/Desc | -35.86 | +307.43 | | 100000 | Asc/Desc | -63.49 | +301.43 | | 1000000 | Asc/Desc | -70.35 | +300.08 | | 10000000 | Asc/Desc | -72.07 | +300.01 |
Supported Types
Parsort provides specialized sorting functions for each of these types:
int,int8,int16,int32,int64uint,uint8,uint16,uint32,uint64float32,float64stringtime.Timestruct(via generics)
Each type has both ascending (TypeAsc) and descending (TypeDesc) sorting functions.
Performance Tuning
Parsort automatically determines if a slice is large enough to benefit from parallel sorting. The default thresholds work well for most systems, but you can optimize them for your specific hardware:
// Optimize thresholds for your hardware
parsort.Tune()
// Or tune with specific parameters
parsort.TuneSpecific(5, 1000, 1000, -10.0, true)
Thresholds can also be manually adjusted:
// Manually set threshold for int sorting
parsort.IntMinParallelSize = 5000
📌 Additional Resources
- Struct sorting details
- Comparison to other libraries
- Raw benchmark data
- Changelog
- Tuner documentation
💚 Support
Parsort was built out of love for clean and fast code. If it saved you time or brought value to your project, feel free to show some support. Every bit is appreciated 🙂
Related Skills
node-connect
352.0kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
111.1kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
352.0kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
352.0kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。

