1brc

C11 implementation of the 1 Billion Rows Challenge. 1️⃣🐝🏎️ Runs in ~1.6 seconds on my not-so-fast laptop CPU w/ 16GB RAM.

Generate Convert Improve

Install / Use

/learn @dannyvankooten/1brc

About this skill

Quality Score

0/100

README

1️⃣🐝🏎️ The One Billion Row Challenge

Challenge blog post: https://www.morling.dev/blog/one-billion-row-challenge/
Challenge repository: https://github.com/gunnarmorling/1brc

The challenge: compute simple floating-point math over 1 billion rows. As fast as possible, without dependencies.

Implemented in standard C11 with POSIX threads (however, no SIMD). analyze.c contains the fastest implementation, while {1..7}.c contain slower versions of the same program.

I wrote up some implmentation details on my blog here: https://www.dannyvankooten.com/blog/2024/1brc/

Running the challenge

First, compile the two programs using any capable C-compiler.

make

To compile in debug mode:

DEBUG=1 make

By default, Make will attempt to find the number of threads to use from nproc or sysctl. To compile while specifying the number of threads to use explicitly:

NTHREADS=8 make

Create the measurements file with 1B rows

bin/create-sample 1000000000

This will create a 12 GB file with 1B rows named measurements.txt in your current working directory. The program to create this sample file will take a minute or two, but you only need to run it once.

Run the challenge:

time bin/analyze measurements.txt >/dev/null

real	0m1.392s
user	0m0.000s
sys	    0m0.010sys

Note: the performance difference between a warm and a hot pagecache is quite extreme. Run echo 3 > /proc/sys/vm/drop_caches to drop your pagecache, then run the program twice in a row. It's not uncommon for the second run to be well over twice as fast.

Benchmarks

Since I don't have access to a Hetzner CCX33 box, here are the reference times for the currently leading Java implementations from the official challenge when I run them on my machine.

| # | Result (m:s.ms) | Implementation | Language | Submitter | |---|-----------------|--------------------|-----|---------------| | ? | 00:01.590 | link| C | Danny van Kooten| | 1.| 00:06.131 | link| 21.0.1-graalce| Sam Pullara| | 2.| 00:06.421 | link| 21.0.1-graalce | Roy van Rijn|

Progressions

You can find the average runtime (across 5 consecutive runs) for the various states of the program below, from baseline to the final and fully optimized version. Because I have no patience, this was run on a measurements file with only 100M rows.

1.c runtime=[ 55.86 59.09 64.28 63.63 56.08 ] average=59.79s   linear-search by city name (baseline)
2.c runtime=[ 9.14 9.31 9.35 9.05 9.30 ] average=9.23s hashmap with linear probing
3.c runtime=[ 4.27 4.51 4.47 4.28 4.25 ] average=4.36s custom temperature float parser instead of strod
4.c runtime=[ 2.38 2.41 2.46 2.40 2.39 ] average=2.41s fread with 64MB chunks instead of line-by-line
5.c runtime=[ 2.13 1.99 1.99 2.00 2.05 ] average=2.03s unroll parsing of city name and generating hash
6.c runtime=[ 0.49 0.49 0.49 0.50 0.50 ] average=0.49s parallelize across 16 threads
7.c runtime=[ 0.30 0.25 0.23 0.24 0.24 ] average=0.25s mmap entire file instead of fread in chunks

You can run the benchmark script for all progressions by executing ./run-progressions.sh (needs bash, make, time and awk).

Related Skills

node-connect

343.1k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

90.0k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

343.1k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

qqbot-media

343.1k

QQBot 富媒体收发能力。使用 <qqmedia> 标签，系统根据文件扩展名自动识别类型（图片/语音/视频/文件）。