SkillAgentSearch skills...

1brc

Gunnar's 1 Billion Row Challenge (Python)

Install / Use

/learn @ifnesi/1brc
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

1BRC: One Billion Row Challenge in Python

Python implementation of Gunnar's 1 billion row challenge:

  • https://www.morling.dev/blog/one-billion-row-challenge
  • https://github.com/gunnarmorling/1brc

Creating the measurements file with 1B rows

First install the Python requirements:

python3 -m pip install -r requirements.txt

The script createMeasurements.py will create the measurement file:

usage: createMeasurements.py [-h] [-o OUTPUT] [-r RECORDS]

Create measurement file

optional arguments:
  -h, --help            show this help message and exit
  -o OUTPUT, --output OUTPUT
                        Measurement file name (default is "measurements.txt")
  -r RECORDS, --records RECORDS
                        Number of records to create (default is 1_000_000_000)

Example:

% python3 createMeasurements.py
Creating measurement file 'measurements.txt' with 1,000,000,000 measurements...
100%|█████████████████████████████████████████| 100/100 [01:15<00:00,  1.32it/s]
Created file 'measurements.txt' with 1,000,000,000 measurements in 75.86 seconds

Be patient as it can take more than a minute to have the file generated.

Maybe as another challenge is to speed up the generation of the measurements file :slightly_smiling_face:

Performance (on a MacBook Pro M1 32GB)

| Interpreter | Script | user | system | cpu | total | | ----------- | ------ | ---- | ------ | --- | ----- | | python3 | calculateAveragePolars.py | 77.84 | 3.64 | 703% | 11.585 | | pypy3 | calculateAveragePypy.py | ~~139.15~~<br>135.25 | ~~3.02s~~<br>2.92 | ~~699%~~<br>735% | ~~20.323~~<br>18.782 | | python3 | calculateAverageDuckDB.py | 186.78 | 4.21 | 806% | 23.673 | | pypy3 | calculateAverage.py | ~~284.90~~<br>242.89 | ~~9.12~~<br>6.28 | ~~749%~~<br>780% | ~~39.236~~<br>31.926 | | python3 | calculateAverage.py | ~~378.54~~<br>329.20 | ~~6.94~~<br>3.77 | ~~747%~~<br>793% | ~~51.544~~<br>41.941 | | python3 | calculateAveragePypy.py | ~~573.77~~<br>510.93 | ~~2.70~~<br>1.88 | ~~787%~~<br>793% | ~~73.170~~<br>64.660 |

The script calculateAveragePolars.py was suggested by Taufan on this post.

The script calculateAveragePypy.py was created by donalm, a +2x improved version of the initial script (calculateAverage.py) when running in pypy3, even capable of beating the implementation using DuckDB calculateAverageDuckDB.py.

Olivier Scalbert has made a simple but incredible suggestion where performance increased by an average of 15% (table above has been updated), thank you :slightly_smiling_face:

His suggestions were to change from:

if measurement < result[location][0]:
    result[location][0] = measurement
if measurement > result[location][1]:
    result[location][1] = measurement
result[location][2] += measurement
result[location][3] += 1

to:

_result = result[location]
if measurement < _result[0]:
    _result[0] = measurement
if measurement > _result[1]:
    _result[1] = measurement
_result[2] += measurement
_result[3] += 1

Python can be surprising sometimes.

Compare results

Run compare.sh if you want to check that all the scripts produce the same output.

View on GitHub
GitHub Stars89
CategoryDevelopment
Updated8d ago
Forks92

Languages

Python

Security Score

80/100

Audited on Mar 23, 2026

No findings