Regex
JS regexes ➕ future. A template tag for readable, high-performance, native JS regexes with extended syntax, context-aware interpolation, and always-on best practices.
Install / Use
/learn @slevithan/RegexREADME
[![npm version][npm-version-src]][npm-version-href] [![npm downloads][npm-downloads-src]][npm-downloads-href] [![bundle][bundle-src]][bundle-href]
The Regex+ library (regex on npm) provides a template tag named regex. This tag modernizes JavaScript regular expressions with always-on best practices and support for new features that make regexes more powerful and dramatically more readable. The regex tag returns native RegExp instances that run with native performance and can exceed the performance of regex literals you'd write yourself.
With the Regex+ library, JavaScript steps up as one of the best regex flavors alongside PCRE and Perl, possibly surpassing C++, Java, .NET, Python, and Ruby.
Features that the regex tag adds to native JavaScript regular expressions include insignificant whitespace and comments for readability, atomic groups and possessive quantifiers that can help you avoid ReDoS, subroutines and definition groups that enable grammatical patterns with powerful subpattern composition, and context-aware interpolation of regexes, escaped strings, and partial patterns.
Details:
- Lightweight (7 kB minzip)
- Available as a Babel plugin, for no runtime cost and zero runtime dependencies
- Supports all ES2026 regex features
- JS library with type definitions included
📜 Contents
- Install and use
- Features
- Examples
- Context
- Extended regex syntax
- Flags
- Interpolation
- Options
- Performance
- Compatibility
- FAQ
🕹️ Install and use
First run pnpm install regex, or the equivalent with your package manager of choice.
Then it's just:
import {regex} from 'regex';
const str = 'abc';
// Returns a native RegExp instance, so it works with all string/regexp methods
regex`\w`.test(str); // → true
str.match(regex('g')`\w`); // → ['a', 'b', 'c']
<details>
<summary>In browsers</summary>
ESM:
<script type="module">
import {regex} from 'https://esm.run/regex';
// …
</script>
Using a global name:
<script src="https://cdn.jsdelivr.net/npm/regex/dist/regex.min.js"></script>
<script>
const {regex} = Regex;
// …
</script>
</details>
💎 Features
A modern regex baseline so you don't need to continually opt-in to best practices.
- Always-on flag <kbd>v</kbd> gives you the best level of Unicode support and strict errors.
- New flags:
- Always-on flag <kbd>x</kbd> allows you to freely add whitespace and comments to your regexes.
- Always-on flag <kbd>n</kbd> ("named capture only" mode) improves regex readability and efficiency.
- No unreadable escaped backslashes
\\\\since it's a raw string template tag.
Extended regex syntax.
- Atomic groups and possessive quantifiers can dramatically improve performance and prevent ReDoS.
- Subroutines and definition groups enable powerful composition, improving readability and maintainability.
- Recursive matching via an official plugin.
- Custom syntax plugins supported.
Context-aware and safe interpolation of regexes, strings, and partial patterns.
- Interpolated strings have their special characters escaped.
- Interpolated regexes locally preserve the meaning of their own flags (or their absense), and their numbered backreferences are adjusted to work within the overall pattern.
🪧 Examples
import {regex, pattern} from 'regex';
// Subroutines and a subroutine definition group
// Also shows insignificant whitespace for readability
const record = regex`
^ Admitted: \g<date> \n
Released: \g<date> $
(?(DEFINE)
(?<date> \g<year>-\g<month>-\g<day>)
(?<year> \d{4})
(?<month> \d{2})
(?<day> \d{2})
)
`;
// Atomic group: Avoids ReDoS from the nested, overlapping quantifier
const words = regex`^(?>\w+\s?)+$`;
// Context-aware interpolation
// Also shows line comments and insignificant whitespace
const re = regex('m')`
# RegExp: Only the inner regex is case insensitive (flag i), and the outer
# regex's flags (including implicit flag x) don't apply to it
${/^a.$/i}
|
# String: Regex special chars are escaped
# Note that quantifying an interpolated value repeats it as a complete unit
${'a|b'}+
|
# Pattern: This string is contextually sandboxed but not escaped
[${pattern('a-z')}]
`;
// Numbered backreferences in interpolated regexes are adjusted
const double = /(.)\1/;
regex`^ (?<first>.) ${double} ${double} $`;
// → /^(?<first>.)(.)\2(.)\3$/v
<details>
<summary>Show examples of using subroutine definition groups to refactor previously unmaintainable regexes</summary>
Date/time regex
// Unmaintainable regex from in-the-wild code for validating a date and time
const DATE_FILTER_RE =
/^(since|until):((?!0{3})\d{4}(?:-(?:0[1-9]|1[0-2])(?:-(?:0[1-9]|[12]\d|3[01])(?:T(?:[01]\d|2[0-3]):[0-5]\d(?::[0-5]\d(?:\.\d+)?)?(?:Z|(?!-00:00)[+-](?:[01]\d|2[0-3]):(?:[0-5]\d))?)?)?)?)$/;
// Refactored for Regex+, with identical matches
// Includes a subroutine definition group at the bottom
const DATE_FILTER_RE = regex`
^
(?<prefix> since | until) :
(?<dateTime>
\g<year>
(- \g<month> (- \g<day> (T \g<time>)? )? )?
)
$
(?(DEFINE)
(?<year> (?! 000) \d{4})
(?<month> 0[1-9] | 1[0-2])
(?<day> [1-9] | [12]\d | 3[01])
(?<time> \g<hour> : \g<minute> (: \g<second>)? \g<timeZone>?)
(?<hour> [01]\d | 2[0-3])
(?<minute> [0-5\d])
(?<second> [0-5]\d (\.\d+)?)
(?<timeZone> Z | (?! -00:00) [+\-] \g<hour> : \g<minute>)
)
`;
If desired, the refactored version can be minified back into a regex literal during a build step, using Regex+'s Babel plugin.
IP address regex
// Unmaintainable regex used by Valibot for validating an IP address
const IP_REGEX =
/^(?:(?:[1-9]|1\d|2[0-4])?\d|25[0-5])(?:\.(?:(?:[1-9]|1\d|2[0-4])?\d|25[0-5])){3}$|^(?:(?:[\da-f]{1,4}:){7}[\da-f]{1,4}|(?:[\da-f]{1,4}:){1,7}:|(?:[\da-f]{1,4}:){1,6}:[\da-f]{1,4}|(?:[\da-f]{1,4}:){1,5}(?::[\da-f]{1,4}){1,2}|(?:[\da-f]{1,4}:){1,4}(?::[\da-f]{1,4}){1,3}|(?:[\da-f]{1,4}:){1,3}(?::[\da-f]{1,4}){1,4}|(?:[\da-f]{1,4}:){1,2}(?::[\da-f]{1,4}){1,5}|[\da-f]{1,4}:(?::[\da-f]{1,4}){1,6}|:(?:(?::[\da-f]{1,4}){1,7}|:)|fe80:(?::[\da-f]{0,4}){0,4}%[\da-z]+|::(?:f{4}(?::0{1,4})?:)?(?:(?:25[0-5]|(?:2[0-4]|1?\d)?\d)\.){3}(?:25[0-5]|(?:2[0-4]|1?\d)?\d)|(?:[\da-f]{1,4}:){1,4}:(?:(?:25[0-5]|(?:2[0-4]|1?\d)?\d)\.){3}(?:25[0-5]|(?:2[0-4]|1?\d)?\d))$/iu;
// Refactored for Regex+, with identical matches
// All except the first line are subroutine definitions
const IP_REGEX = regex('i')`
^ (\g<ipv4> | \g<ipv6>) $
(?(DEFINE)
(?<ipv4> \g<byte> (\. \g<byte>){3})
(?<byte> 25[0-5] | 2[0-4]\d | 1\d\d | [1-9]?\d)
(?<segment> \p{AHex}{1,4})
(?<part> \g<segment> :)
(?<ipv6>
( \g<part>{7}
| :: \g<part>{0,6}
| \g<part> : \g<part>{0,5}
| \g<part>{2} : \g<part>{0,4}
| \g<part>{3} : \g<part>{0,3}
| \g<part>{4} : \g<part>{0,2}
| \g<part>{5} : \g<part>?
) \g<segment>
| ::
# With zone identifier
| fe80: (: \p{AHex}{0,4}){0,4} % [\da-z]+
# Mixed addresses
| :: (ffff (: 0{1,4})? :)? \g<ipv4>
| \g<part>{1,4} : \g<ipv4>
)
)
`;
The refactored regex intentionally reproduces the original's matches exactly, even though there are some edge cases where it doesn't follow the IPv6 spec. However, since it's written in a maintainable way, the bugs are much more easily spotted and fixed. Good luck if you want to update the original, structureless regex!
</details>❓ Context
Due to years of legacy and backward compatibility, regular expression syntax in JavaScript is a bit of a mess. There are four sets of incompatible syntax and behavior rules that might apply to your regexes depending on the flags and features you use. The differences are hard to remember or understand, and can easily create subtle bugs.
<details> <summary>See the four parsing modes</summary>- Unicode-unaware (legacy) mode is the default and can easily and silently create Unicode-related bugs.
- Named capture mode changes the meaning of
\kwhen a named capture appears anywhere in a regex. - Unicode mode with flag <kbd>u</kbd> adds strict errors (for unreserved escapes, octal escapes, quantified lookahead, etc.), switches to code point matching (changing the potential handling of the dot, negated sets like
\W, character class ranges, and quantifiers), changes flag <kbd>i</kbd> to apply Unicode case-folding, and adds support for new syntax. - UnicodeSets mode with flag <kbd>v</kbd> (an upgrade to <kbd>u</kbd>) incompatibly changes escaping rules within character classes, fixes case-insensitive matching for
\pand\Pwithin negated[^…], and adds support f
