SkillAgentSearch skills...

Coregex

Pure Go production-grade regex engine with SIMD optimizations. Up to 3-3000x+ faster than stdlib.

Install / Use

/learn @coregx/Coregex
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

coregex

GitHub Release Go Version Go Reference CI Go Report Card codecov License GitHub Stars GitHub Issues GitHub Discussions

High-performance regex engine for Go. Drop-in replacement for regexp with 3-3000x speedup.*

<sub>* Typical speedup 15-240x on real-world patterns. 1000x+ achieved on specific edge cases where prefilters skip entire input (e.g., IP pattern on text with no digits).</sub>

Why coregex?

Go's stdlib regexp is intentionally simple — single NFA engine, no optimizations. This guarantees O(n) time but leaves performance on the table.

coregex brings Rust regex-crate architecture to Go:

  • Multi-engine: 17 strategies — Lazy DFA, PikeVM, OnePass, BoundedBacktracker, and more
  • SIMD prefilters: AVX2/SSSE3 for fast candidate rejection
  • Reverse search: Suffix/inner literal patterns run 1000x+ faster
  • O(n) guarantee: No backtracking, no ReDoS vulnerabilities

Installation

go get github.com/coregx/coregex

Requires Go 1.25+. Minimal dependencies (golang.org/x/sys, github.com/coregx/ahocorasick).

Quick Start

package main

import (
    "fmt"
    "github.com/coregx/coregex"
)

func main() {
    re := coregex.MustCompile(`\w+@\w+\.\w+`)

    text := []byte("Contact support@example.com for help")

    // Find first match
    fmt.Printf("Found: %s\n", re.Find(text))

    // Check if matches (zero allocation)
    if re.MatchString("test@email.com") {
        fmt.Println("Valid email format")
    }
}

Performance

Cross-language benchmarks on 6MB input, AMD EPYC (source):

| Pattern | Go stdlib | coregex | Rust regex | vs stdlib | vs Rust | |---------|-----------|---------|------------|-----------|---------| | Literal alternation | 554 ms | 4.5 ms | 0.72 ms | 122x | 6.2x slower | | Multi-literal | 1572 ms | 12.4 ms | 5.5 ms | 126x | 2.2x slower | | Inner .*keyword.* | 238 ms | 0.27 ms | 0.33 ms | 881x | 1.2x faster | | Suffix .*\.txt | 239 ms | 1.9 ms | 1.2 ms | 125x | 1.5x slower | | Multiline (?m)^/.*\.php | 102 ms | 0.34 ms | 0.75 ms | 299x | 2.2x faster | | Email validation | 257 ms | 0.46 ms | 0.31 ms | 557x | 1.4x slower | | URL extraction | 256 ms | 0.62 ms | 0.37 ms | 413x | 1.6x slower | | IP address | 494 ms | 0.72 ms | 13.5 ms | 685x | 18.8x faster | | Version \d+.\d+.\d+ | 164 ms | 0.62 ms | 0.79 ms | 263x | 1.2x faster | | Char class [\w]+ | 478 ms | 42.1 ms | 56.4 ms | 11x | 1.3x faster | | Word repeat (\w{2,8})+ | 690 ms | 180 ms | 54.7 ms | 3x | 3.2x slower |

Where coregex excels:

  • Multiline patterns ((?m)^/.*\.php) — 2.2x faster than Rust, 299x vs stdlib
  • IP/phone patterns (\d+\.\d+\.\d+\.\d+) — SIMD digit prefilter skips non-digit regions
  • Suffix patterns (.*\.log, .*\.txt) — reverse search optimization (1000x+)
  • Inner literals (.*error.*, .*@example\.com) — bidirectional DFA (900x+)
  • Multi-pattern (foo|bar|baz|...) — Slim Teddy (≤32), Fat Teddy (33-64), or Aho-Corasick (>64)
  • Anchored alternations (^(\d+|UUID|hex32)) — O(1) branch dispatch (5-20x)
  • Concatenated char classes ([a-zA-Z]+[0-9]+) — DFA with byte classes (5-7x)
  • Zero-alloc iterators (AllIndex, AppendAllIndex) — 0 heap allocs, up to 30% faster than FindAll. Email pattern faster than Rust with AppendAllIndex.

Features

Engine Selection

coregex automatically selects the optimal engine:

| Strategy | Pattern Type | Speedup | |----------|--------------|---------| | AnchoredLiteral | ^prefix.*suffix$ | 32-133x | | MultilineReverseSuffix | (?m)^/.*\.php | 100-552x ⚡ | | ReverseInner | .*keyword.* | 100-900x | | ReverseSuffix | .*\.txt | 100-1100x | | BranchDispatch | ^(\d+\|UUID\|hex32) | 5-20x | | CompositeSequenceDFA | [a-zA-Z]+[0-9]+ | 5-7x | | LazyDFA | IP, complex patterns | 10-150x | | AhoCorasick | a\|b\|c\|...\|z (>64 patterns) | 75-113x | | CharClassSearcher | [\w]+, \d+ | 4-25x | | Slim Teddy | foo\|bar\|baz (2-32 patterns) | 15-240x | | Fat Teddy | 33-64 patterns | 60-73x | | OnePass | Anchored captures | 10x | | BoundedBacktracker | Small patterns | 2-5x |

API Compatibility

Drop-in replacement for regexp.Regexp:

// stdlib
re := regexp.MustCompile(pattern)

// coregex — same API
re := coregex.MustCompile(pattern)

Supported methods:

  • Match, MatchString, MatchReader
  • Find, FindString, FindAll, FindAllString
  • FindIndex, FindStringIndex, FindAllIndex
  • FindSubmatch, FindStringSubmatch, FindAllSubmatch
  • ReplaceAll, ReplaceAllString, ReplaceAllFunc
  • Split, SubexpNames, NumSubexp
  • Longest, Copy, String

Zero-Allocation APIs

// Zero allocations — boolean match
matched := re.IsMatch(text)

// Zero allocations — single match indices
start, end, found := re.FindIndices(text)

// Zero allocations — iterator over all matches (Go 1.23+)
for m := range re.AllIndex(data) {
    fmt.Printf("match at [%d, %d]\n", m[0], m[1])
}

// Zero allocations — match content iterator
for s := range re.AllString(text) {
    fmt.Println(s)
}

// Buffer-reuse — append to caller's slice (strconv.Append* pattern)
var buf [][2]int
for _, chunk := range chunks {
    buf = re.AppendAllIndex(buf[:0], chunk, -1)
    process(buf)
}

Configuration

config := coregex.DefaultConfig()
config.DFAMaxStates = 10000      // Limit DFA cache
config.EnablePrefilter = true    // SIMD acceleration

re, err := coregex.CompileWithConfig(pattern, config)

Thread Safety

A compiled *Regexp is safe for concurrent use by multiple goroutines:

re := coregex.MustCompile(`\d+`)

// Safe: multiple goroutines sharing one compiled pattern
var wg sync.WaitGroup
for i := 0; i < 100; i++ {
    wg.Add(1)
    go func() {
        defer wg.Done()
        re.FindString("test 123 data")  // thread-safe
    }()
}
wg.Wait()

Internally uses sync.Pool (same pattern as Go stdlib regexp) for per-search state management.

Syntax Support

Uses Go's regexp/syntax parser:

| Feature | Support | |---------|---------| | Character classes | [a-z], \d, \w, \s | | Quantifiers | *, +, ?, {n,m} | | Anchors | ^, $, \b, \B | | Groups | (...), (?:...), (?P<name>...) | | Unicode | \p{L}, \P{N} | | Flags | (?i), (?m), (?s) | | Backreferences | Not supported (O(n) guarantee) |

Architecture

Pattern → Parse → NFA → Literal Extract → Strategy Select
                                               ↓
                  ┌────────────────────────────────────────────┐
                  │ Engines (17 strategies):                   │
                  │  LazyDFA, PikeVM, OnePass,                 │
                  │  BoundedBacktracker, ReverseAnchored,      │
                  │  ReverseInner, ReverseSuffix,              │
                  │  ReverseSuffixSet, MultilineReverseSuffix, │
                  │  AnchoredLiteral, CharClassSearcher,       │
                  │  Teddy, DigitPrefilter, AhoCorasick,       │
                  │  CompositeSearcher, BranchDispatch, Both   │
                  └────────────────────────────────────────────┘
                                               ↓
Input → Prefilter (SIMD) → Engine → Match Result

For detailed architecture documentation, see docs/ARCHITECTURE.md. For optimization details, see docs/OPTIMIZATIONS.md.

SIMD Primitives (AMD64):

  • memchr — single byte search (AVX2)
  • memmem — substring search (SSSE3)
  • Slim Teddy — multi-pattern search, 2-32 patterns (SSSE3, 9+ GB/s)
  • Fat Teddy — multi-pattern search, 33-64 patterns (AVX2, 9+ GB/s)

Pure Go fallback on other architectures.

Battle-Tested

coregex was tested in GoAWK. This real-world testing uncovered 15+ edge cases that synthetic benchmarks missed.

Powered by coregex: uawk

uawk is a modern AWK interpreter built on coregex:

| Benchmark (10MB) | GoAWK | uawk | Speedup | |------------------|-------|------|---------| | Regex alternation | 1.85s | 97ms | 19x | | IP matching | 290ms | 99ms | 2.9x | | General regex | 320ms | 100ms | 3.2x |

go install github.com/kolkov/uawk/cmd/uawk@latest
uawk '/error/ { print $0 }' server.log

We need more testers! If you have a project using regexp, try coregex and report issues.

Documentation

View on GitHub
GitHub Stars162
CategoryDevelopment
Updated1d ago
Forks5

Languages

Go

Security Score

100/100

Audited on Mar 30, 2026

No findings