1brc
C11 implementation of the 1 Billion Rows Challenge. 1๏ธโฃ๐๐๏ธ Runs in ~1.6 seconds on my not-so-fast laptop CPU w/ 16GB RAM.
Install / Use
/learn @dannyvankooten/1brcREADME
1๏ธโฃ๐๐๏ธ The One Billion Row Challenge
- Challenge blog post: https://www.morling.dev/blog/one-billion-row-challenge/
- Challenge repository: https://github.com/gunnarmorling/1brc
The challenge: compute simple floating-point math over 1 billion rows. As fast as possible, without dependencies.
Implemented in standard C11 with POSIX threads (however, no SIMD). analyze.c contains the fastest implementation, while {1..7}.c contain slower versions of the same program.
I wrote up some implmentation details on my blog here: https://www.dannyvankooten.com/blog/2024/1brc/
Running the challenge
First, compile the two programs using any capable C-compiler.
make
To compile in debug mode:
DEBUG=1 make
By default, Make will attempt to find the number of threads to use from nproc
or sysctl. To compile while specifying the number of threads to use explicitly:
NTHREADS=8 make
Create the measurements file with 1B rows
bin/create-sample 1000000000
This will create a 12 GB file with 1B rows named measurements.txt in your current working directory. The program to create this sample file will take a minute or two, but you only need to run it once.
Run the challenge:
time bin/analyze measurements.txt >/dev/null
real 0m1.392s
user 0m0.000s
sys 0m0.010sys
Note: the performance difference between a warm and a hot pagecache is quite extreme. Run echo 3 > /proc/sys/vm/drop_caches to drop your pagecache, then run the program twice in a row. It's not uncommon for the second run to be well over twice as fast.
Benchmarks
Since I don't have access to a Hetzner CCX33 box, here are the reference times for the currently leading Java implementations from the official challenge when I run them on my machine.
| # | Result (m:s.ms) | Implementation | Language | Submitter | |---|-----------------|--------------------|-----|---------------| | ? | 00:01.590 | link| C | Danny van Kooten| | 1.| 00:06.131 | link| 21.0.1-graalce| Sam Pullara| | 2.| 00:06.421 | link| 21.0.1-graalce | Roy van Rijn|
Progressions
You can find the average runtime (across 5 consecutive runs) for the various states of the program below, from baseline to the final and fully optimized version. Because I have no patience, this was run on a measurements file with only 100M rows.
1.c runtime=[ 55.86 59.09 64.28 63.63 56.08 ] average=59.79s linear-search by city name (baseline)
2.c runtime=[ 9.14 9.31 9.35 9.05 9.30 ] average=9.23s hashmap with linear probing
3.c runtime=[ 4.27 4.51 4.47 4.28 4.25 ] average=4.36s custom temperature float parser instead of strod
4.c runtime=[ 2.38 2.41 2.46 2.40 2.39 ] average=2.41s fread with 64MB chunks instead of line-by-line
5.c runtime=[ 2.13 1.99 1.99 2.00 2.05 ] average=2.03s unroll parsing of city name and generating hash
6.c runtime=[ 0.49 0.49 0.49 0.50 0.50 ] average=0.49s parallelize across 16 threads
7.c runtime=[ 0.30 0.25 0.23 0.24 0.24 ] average=0.25s mmap entire file instead of fread in chunks
You can run the benchmark script for all progressions by executing ./run-progressions.sh (needs bash, make, time and awk).
Related Skills
node-connect
343.1kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
90.0kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
343.1kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
343.1kQQBot ๅฏๅชไฝๆถๅ่ฝๅใไฝฟ็จ <qqmedia> ๆ ็ญพ๏ผ็ณป็ปๆ นๆฎๆไปถๆฉๅฑๅ่ชๅจ่ฏๅซ็ฑปๅ๏ผๅพ็/่ฏญ้ณ/่ง้ข/ๆไปถ๏ผใ
