SkillAgentSearch skills...

Rebar

A biased barometer for gauging the relative speed of some regex engines on a curated set of tasks.

Install / Use

/learn @BurntSushi/Rebar
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

rebar

A biased barometer for gauging the relative speed of some regex engines on a curated set of tasks.

Links

  • METHODOLOGY describes the motivation, design, benchmark selection and evaluation protocol used by rebar.
  • BUILD describes how to build rebar and the regex engines it measures.
  • TUTORIAL provides a guided exploration of some of the most useful rebar sub-commands.
  • CONTRIBUTING describes how to add new benchmarks and how to add a new regex engine to benchmark.
  • MODELS describes the different types of workloads measured.
  • FORMAT describes the directory hierarchy and TOML format for how benchmarks are defined.
  • KLV describes the format of data given to regex engine runner programs for how to execute a benchmark.
  • BIAS is a work-in-progress document describing the bias of this barometer.
  • WANTED provides some ideas for other regex engines to add to rebar.
  • BYOB discusses how to "bring your own benchmarks." That is, anyone can use rebar with their own engine and benchmark definitions.

Results

This section shows the results of a curated and biased set of benchmarks. These reflect only a small subset of the benchmarks defined in this repository, but were carefully crafted to attempt to represent a broad range of use cases and annotated where possible with analysis to aide in the interpretation of results.

The results begin with a summary, then a list of links to each benchmark group and then finally the results for each group. Results are shown one benchmark group at a time, where a single group is meant to combine related regexes or workloads, where it is intended to be useful to see how results change across regex engines. Analysis is provided, at minimum, for every group. Although, analysis is heavily biased towards Rust's regex crate, as it is what this author knows best. However, contributions that discuss other regex engines are very welcomed.

Below each group of results are the parameters for each individual benchmark within that group. An individual benchmark may contain some analysis specific to it, but it will at least contain a summary of the benchmark details. Some parameters, such as the haystack, are usually too big to show in this README. One can use rebar to look at the haystack directly. Just take the full name of the benchmark and give it to the rebar haystack command. For example:

$ rebar haystack unicode/compile/fifty-letters
ͱͳͷΐάέήίΰαβγδεζηθικλμνξοπρςστυφχψωϊϋόύώϙϛϝϟϡϸϻͱͳͷΐάέή

Similarly, the full benchmark execution details (including the haystack) can be seen with the rebar klv command:

$ rebar klv unicode/compile/fifty-letters
name:29:unicode/compile/fifty-letters
model:7:compile
pattern:7:\pL{50}
case-insensitive:5:false
unicode:4:true
haystack:106:ͱͳͷΐάέήίΰαβγδεζηθικλμνξοπρςστυφχψωϊϋόύώϙϛϝϟϡϸϻͱͳͷΐάέή
max-iters:1:0
max-warmup-iters:1:0
max-time:1:0
max-warmup-time:1:0

Finally, you can run the benchmark yourself and look at results on the command line:

$ rebar measure -f '^unicode/compile/fifty-letters$' | tee results.csv
$ rebar cmp results.csv
<!-- BEGIN: report --> <!-- Auto-generated by rebar, do not edit manually! --> <!-- Generated with command: --> <!-- rebar report --splice README.md --statistic median --units throughput -f ^curated/ --summary-exclude ^(rust/regexold)$ record/curated/2025-12-19/d-dmd-std-regex.csv record/curated/2025-12-19/d-ldc-std-regex.csv record/curated/2025-12-19/dotnet-compiled.csv record/curated/2025-12-19/dotnet.csv record/curated/2025-12-19/dotnet-nobacktrack.csv record/curated/2025-12-19/go-regexp.csv record/curated/2025-12-19/hyperscan.csv record/curated/2025-12-19/icu.csv record/curated/2025-12-19/java-hotspot.csv record/curated/2025-12-19/javascript-v8.csv record/curated/2025-12-19/pcre2.csv record/curated/2025-12-19/pcre2-jit.csv record/curated/2025-12-19/perl.csv record/curated/2025-12-19/python-re.csv record/curated/2025-12-19/python-regex.csv record/curated/2025-12-19/re2.csv record/curated/2025-12-19/regress.csv record/curated/2025-12-19/rust-aho-corasick-dfa.csv record/curated/2025-12-19/rust-aho-corasick-nfa.csv record/curated/2025-12-19/rust-aho-corasick-teddy.csv record/curated/2025-12-19/rust-memchr-memmem.csv record/curated/2025-12-19/rust-regex-ast.csv record/curated/2025-12-19/rust-regex-backtrack.csv record/curated/2025-12-19/rust-regex.csv record/curated/2025-12-19/rust-regex-dense.csv record/curated/2025-12-19/rust-regex-hir.csv record/curated/2025-12-19/rust-regex-hybrid.csv record/curated/2025-12-19/rust-regex-lite.csv record/curated/2025-12-19/rust-regex-meta.csv record/curated/2025-12-19/rust-regex-nfa.csv record/curated/2025-12-19/rust-regexold.csv record/curated/2025-12-19/rust-regex-onepass.csv record/curated/2025-12-19/rust-regex-pikevm.csv record/curated/2025-12-19/rust-regex-sparse.csv -->

Summary

Below are two tables summarizing the results of regex engines benchmarked. Each regex engine includes its version at the time measurements were captured, a summary score that ranks it relative to other regex engines across all benchmarks and the total number of measurements collected.

The first table ranks regex engines based on search time. The second table ranks regex engines based on compile time.

The summary statistic used is the geometric mean of the speed ratios for each regex engine across all benchmarks that include it. The ratios within each benchmark are computed from the median of all timing samples taken, and dividing it by the best median of the regex engines that participated in the benchmark. For example, given two regex engines A and B with results 35 ns and 25 ns on a single benchmark, A has a speed ratio of 1.4 and B has a speed ratio of 1.0. The geometric mean reported here is then the "average" speed ratio for that regex engine across all benchmarks.

If you're looking to compare two regex engines specifically, then it is better to do so based only on the benchmarks that they both participate in. For example, to compared based on the results recorded on 2023-05-04, one can do:

$ rebar rank record/all/2023-05-04/*.csv -f '^curated/' -e '^(rust/regex|hyperscan)$' --intersection -M compile
Engine      Version           Geometric mean of speed ratios  Benchmark count
------      -------           ------------------------------  ---------------
hyperscan   5.4.1 2023-02-22  2.03                            25
rust/regex  1.8.1             2.13                            25

Caution: Using a single number to describe the overall performance of a regex engine is a fraught endeavor, and it is debatable whether it should be included here at all. It is included primarily because the number of benchmarks is quite large and overwhelming. It can be quite difficult to get a general sense of things without a summary statistic. In particular, a summary statistic is also useful to observe how the overall picture itself changes as changes are made to the barometer. (Whether it be by adding new regex engines or adding/removing/changing existing benchmarks.) One particular word of caution is that while geometric mean is more robust with respect to outliers than arithmetic mean, it is not unaffected by them. Therefore, it is still critical to examine individual benchmarks if one wants to better understanding the performance profile of any specific regex engine or workload.

Summary of search-time benchmarks

| Engine | Version | Geometric mean of speed ratios | Benchmark count | | ------ | ------- | ------------------------------ | --------------- | | hyperscan | 5.4.2 2023-04-22 | 2.37 | 28 | | rust/regex | 1.12.2 | 3.08 | 38 | | dotnet/compiled | 10.0.0 | 3.60 | 34 | | pcre2/jit | 10.47 2025-10-21 | 6.00 | 34 | | dotnet/nobacktrack | 10.0.0 | 6.36 | 29 | | re2 | 2025-11-05 | 10.39 | 31 | | javascript/v8 | 25.2.1 | 11.99 | 32 | | d/ldc/std-regex | 2.111 | 22.30 | 31 | | regress | 0.9.1 | 31.89 | 32 | | perl | 5.42.0 | 41.70 | 33 | | python/re | 3.13.7 | 42.10 | 33 | | java/hotspot | 25.0.1+8-LTS-27 | 42.24 | 34 | | python/regex | 2025.11.3 | 43.26 | 34 | | icu | 72.1.0 | 48.63 | 34 | | go/regexp | 1.25.4 | 74.79 | 31 | | pcre2 | 10.47 2025-10-21 | 114.75 | 33 | | rust/regex/lite | 0.1.8 | 156.49 | 28 |

Summary of compile-time benchmarks

| Engine | Version | Geometric mean of speed ratios | Benchmark count | | ------ | ------- | ------------------------------ | --------------- | | pcre2 | 10.47 2025-10-21 | 1.41 | 10 | | rust/regex/lite | 0.1.8 | 2.73 | 10 | | regress | 0.9.1 | 2.87 | 9 | | icu | 72.1.0 | 3.41 | 11 | | go/regexp | 1.25.4 | 5.24 | 10 | | pcre2/jit | 10.47 2025-10-21 | 6.20 | 11 | | rust/regex | 1.12.2 | 11.85 | 14 | | re2 | 2025-11-05 | 12.28 | 10 | | dotnet/compiled | 10.0.0 | 19.79 | 10 | | python/re | 3.13.7 | 36.09 | 11 | | dotnet/nobacktrack | 10.0.0 | 98.23 | 6 | | [python/regex](benchmarks/../engine

View on GitHub
GitHub Stars298
CategoryDevelopment
Updated5d ago
Forks25

Languages

Python

Security Score

95/100

Audited on Mar 26, 2026

No findings