Regexr
Regular expressions for humans
Install / Use
/learn @pwwang/RegexrREADME
regexr
Regular expressions for humans
Instead of writing a regular expression to match an URL:
# need to be compiled with re.X
regex = r'''
^(?P<protocol>http|https|ftp|mailto|file|data|irc)://
(?P<domain>[A-Za-z0-9-]{0,63}(?:\.[A-Za-z0-9-]{0,63})+)
(?::(?P<port>\d{1,4}))?
(?P<path>/*(?:/*[A-Za-z0-9\-._]+/*)*)
(?:\?(?P<query>.*?))?
(?:\#(?P<fragment>.*))?$
'''
You can write:
regexr = Regexr(
START,
## match the protocol
Or('http', 'https', 'ftp', 'mailto', 'file', 'data', 'irc', capture="protocol"),
'://',
## match the domain
Capture(
Repeat(OneOfChars('A-Z', 'a-z', '0-9', '-'), m=0, n=63),
OneOrMore(DOT, Repeat(OneOfChars('A-Z', 'a-z', '0-9', '-'), m=0, n=63)),
name="domain",
),
## match the port
Maybe(':', Capture(Repeat(DIGIT, m=1, n=4), name="port")),
## match the path
Capture(
ZeroOrMore('/'),
ZeroOrMore(
ZeroOrMore('/'),
OneOrMore(OneOfChars('A-Z', 'a-z', '0-9', r'\-._')),
ZeroOrMore('/'),
),
name="path",
),
## match the query
Maybe("?", Capture(Lazy(MAYBE_ANYCHARS), name="query")),
## and finally the fragment
Maybe("#", Capture(MAYBE_ANYCHARS, name="fragment")),
END,
)
Inspired by rex for R and Regularity for Ruby.
Why?
We have re.X to compile a regular expression in verbose mode, but sometimes it is still difficult to read/write and error-prone.
-
Easy to read/write regular expressions
- For example,
[]]might need a second to understand it. But we can write it asOneOfChars("]")and it will be easier to read.
- For example,
-
Easy to write regular expressions with autocompletions from IDEs
- When we write raw regex, we can't get any hints from IDEs
-
Non-capturing for groups whether possible
- For example, with
Maybe(Maybe("a", "b))we get(?:(?:ab)?)?
- For example, with
-
Easy to avoid unintentional errors
- For example, sometimes it's difficult to debug with
r"(?P<a>>\d+)\D+\awhen we accidentally put one more>after the capturing name.
- For example, sometimes it's difficult to debug with
-
Easy to avoid ambiguity
- For example,
?could be a quantifier meaning0or1match. It could also be a non-greedy (lazy) modifier for quantifiers. It's easy to be distinguished byMaybe(...)andLazy(...)(or quantifiers withlazy=True).
- For example,
-
Easily avoid unbalanced parentheses/brackets/braces
- Especially when we want to match them. For example,
Capture("(")instead of(\().
- Especially when we want to match them. For example,
Usage
More examples
-
Matching a phone number like
XXX-XXX-XXXXor(XXX) XXX XXXXRegexr( START, # match the first part Maybe(Capture('(', name="open_paren")), RepeatExact(DIGIT, m=3), Conditional("open_paren", yes=")"), Maybe(OneOfChars('- ')), # match the second part RepeatExact(DIGIT, m=3), Maybe(OneOfChars('- ')), # match the third part RepeatExact(DIGIT, m=4), END, ) # compiles to # ^(?P<open_paren>\()?\d{3}(?(open_paren)\))[- ]?\d{3}[- ]?\d{4}$ -
Matching an IP address
# Define the pattern for one part of xxx.xxx.xxx.xxx ip_part = Or( # Use Concat instead of NonCapture to avoid brackets # 250-255 Concat("25", OneOfChars('0-5')), # 200-249 Concat("2", OneOfChars('0-4'), DIGIT), # 000-199 Concat(Or("0", "1"), RepeatExact(DIGIT, m=2)), # 00-99 Repeat(DIGIT, m=1, n=2), ) Regexr( START, ip_part, RepeatExact(DOT, ip_part, m=3), END, ) # compiles to # ^(?:25[0-5]|2[0-4]\d|(?:0|1)\d{2}|\d{1,2})(?:\.(?:25[0-5]|2[0-4]\d|(?:0|1)\d{2}|\d{1,2})){3}$ -
Matching an HTML tag roughly (without attributes)
Regexr( START, "<", Capture(WORDS, name="tag"), ">", Lazy(ANYCHARS), "</", Captured("tag"), ">", END, ) # compiles to # ^<(?P<tag>\w+)>.+?</(?P=tag)>$
Pretty print a Regexr object
With the example at the very beginning (matching an URL), we can pretty print it:
# print(regexr.pretty())
# prints:
^
(?P<protocol>http|https|ftp|mailto|file|data|irc)
://
(?P<domain>
[A-Za-z0-9-]{0,63}
(?:\.[A-Za-z0-9-]{0,63})+
)
(?::(?P<port>\d{1,4}))?
(?P<path>
/*
(?:/*[A-Za-z0-9\-._]+/*)*
)
(?:\?(?P<query>.*?))?
(?:\#(?P<fragment>.*))?
$
Compile a Regexr directly
Regexr("a").compile(re.I).match("A")
# <re.Match object; span=(0, 1), match='A'>
API documentation
https://pwwang.github.io/regexr/
TODO
- Support bytes
Related Skills
node-connect
352.0kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
111.1kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
352.0kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
352.0kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
