SkillAgentSearch skills...

Regex

A pure Swift NFA implementation of a regular expression engine

Install / Use

/learn @DavidSkrundz/Regex
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

Regex (V2 WIP) Swift Version Platforms Build Status Codebeat Status Codecov

A pure Swift implementation of a Regular Expression Engine

Trying again with V2 using DFAs instead of NFAs to get grep-like performance

Usage

To avoid compiling overhead it is possible to create a Regex instance

// Compile the expression
let regex = try! Regex(pattern: "[a-zA-Z]+")

let string = "RegEx is tough, but useful."

// Search for matches
let words = regex.match(string)

/*
words = [
	RegexMatch(match: "RegEx", groups: []),
	RegexMatch(match: "is", groups: []),
	RegexMatch(match: "tough", groups: []),
	RegexMatch(match: "but", groups: []),
	RegexMatch(match: "useful", groups: []),
]
*/

If compiling overhead is not an issue it is possible to use the =~ operator to match a string

let fourLetterWords = "drink beer, it's very nice!" =~ "\\b\\w{4}\\b" ?? []

/*
fourLetterWords = [
	RegexMatch(match: "beer", groups: []),
	RegexMatch(match: "very", groups: []),
	RegexMatch(match: "nice", groups: []),
]
*/

By default the Global flag is active. To change which flag are active, add a / at the start of the pattern, and add /<flags> at the end. The available flags are:

  • g Global - Allows multiple matches
  • i Case Insensitive - Case insensitive matching
  • m Multiline - ^ and $ also match the begining and end of a line
// Global and Case Insensitive search
let regex = try! Regex(pattern: "/\\w+/ig")

Supported Operations

Character Classes

| Pattern | Description | Supported | |---------|------------|-----------| | . | [^\n\r] | <ul><li>[ ] </li></ul> | | [^] | [\s\S] | <ul><li>[ ] </li></ul> | | \w | [A-Za-z0-9_] | <ul><li>[ ] </li></ul> | | \W | [^A-Za-z0-9_] | <ul><li>[ ] </li></ul> | | \d | [0-9] | <ul><li>[ ] </li></ul> | | \D | [^0-9] | <ul><li>[ ] </li></ul> | | \s | [\ \r\n\t\v\f] | <ul><li>[ ] </li></ul> | | \S | [^\ \r\n\t\v\f] | <ul><li>[ ] </li></ul> | | [ABC] | Any in the set | <ul><li>[ ] </li></ul> | | [^ABC] | Any not in the set | <ul><li>[ ] </li></ul> | | [A-Z] | Any in the range inclusively | <ul><li>[ ] </li></ul> |

Anchors (Match positions not characters)

| Pattern | Description | Supported | |---------|------------|-----------| | ^ | Beginning of string | <ul><li>[ ] </li></ul> | | $ | End of string | <ul><li>[ ] </li></ul> | | \b | Word boundary | <ul><li>[ ] </li></ul> | | \B | Not word boundary | <ul><li>[ ] </li></ul> |

Escaped Characters

| Pattern | Description | Supported | |---------|------------|-----------| | \0 | Octal escaped character | <ul><li>[ ] </li></ul> | | \00 | Octal escaped character | <ul><li>[ ] </li></ul> | | \000 | Octal escaped character | <ul><li>[ ] </li></ul> | | \xFF | Hex escaped character | <ul><li>[ ] </li></ul> | | \uFFFF | Unicode escaped character | <ul><li>[ ] </li></ul> | | \cA | Control character | <ul><li>[ ] </li></ul> | | \t | Tab | <ul><li>[ ] </li></ul> | | \n | Newline | <ul><li>[ ] </li></ul> | | \v | Vertical tab | <ul><li>[ ] </li></ul> | | \f | Form feed | <ul><li>[ ] </li></ul> | | \r | Carriage return | <ul><li>[ ] </li></ul> | | \0 | Null | <ul><li>[ ] </li></ul> | | \. | . | <ul><li>[ ] </li></ul> | | \\ | \ | <ul><li>[ ] </li></ul> | | \+ | + | <ul><li>[ ] </li></ul> | | \* | * | <ul><li>[ ] </li></ul> | | \? | ? | <ul><li>[ ] </li></ul> | | \^ | ^ | <ul><li>[ ] </li></ul> | | \$ | $ | <ul><li>[ ] </li></ul> | | \{ | { | <ul><li>[ ] </li></ul> | | \} | } | <ul><li>[ ] </li></ul> | | \[ | [ | <ul><li>[ ] </li></ul> | | \] | ] | <ul><li>[ ] </li></ul> | | \( | ( | <ul><li>[ ] </li></ul> | | \) | ) | <ul><li>[ ] </li></ul> | | \/ | / | <ul><li>[ ] </li></ul> | | \| | | | <ul><li>[ ] </li></ul> |

Groups and Lookaround

| Pattern | Description | Supported | |---------|------------|-----------| | (ABC) | Capture group | <ul><li>[ ] </li></ul> | | (<name>ABC) | Named capture group | <ul><li>[ ] </li></ul> | | \1 | Back reference | <ul><li>[ ] </li></ul> | | \'name' | Named back reference | <ul><li>[ ] </li></ul> | | (?:ABC) | Non-capturing group | <ul><li>[ ] </li></ul> | | (?=ABC) | Positive lookahead | <ul><li>[ ] </li></ul> | | (?!ABC) | Negative lookahead | <ul><li>[ ] </li></ul> | | (?<=ABC) | Positive lookbehind | <ul><li>[ ] </li></ul> | | (?<!ABC) | Negative lookbehing | <ul><li>[ ] </li></ul> |

Greedy Quantifiers

| Pattern | Description | Supported | |---------|------------|-----------| | + | One or more | <ul><li>[ ] </li></ul> | | * | Zero or more | <ul><li>[ ] </li></ul> | | ? | Optional | <ul><li>[ ] </li></ul> | | {n} | n | <ul><li>[ ] </li></ul> | | {,} | Same as * | <ul><li>[ ] </li></ul> | | {,n} | n or less | <ul><li>[ ] </li></ul> | | {n,} | n or more | <ul><li>[ ] </li></ul> | | {n,m} | n to m | <ul><li>[ ] </li></ul> |

Lazy Quantifiers

| Pattern | Description | Supported | |---------|------------|-----------| | +? | One or more | <ul><li>[ ] </li></ul> | | *? | Zero or more | <ul><li>[ ] </li></ul> | | ?? | Optional | <ul><li>[ ] </li></ul> | | {n}? | n | <ul><li>[ ] </li></ul> | | {,n}? | n or less | <ul><li>[ ] </li></ul> | | {n,}? | n or more | <ul><li>[ ] </li></ul> | | {n,m}? | n to m | <ul><li>[ ] </li></ul> |

Alternation

| Pattern | Description | Supported | |---------|------------|-----------| | \| | Everything before or everything after | <ul><li>[ ] </li></ul> |

Flags

| Pattern | Description | Supported | |---------|------------|-----------| | i | Case insensitive | <ul><li>[ ] </li></ul> | | g | Global | <ul><li>[ ] </li></ul> | | m | Multiline | <ul><li>[ ] </li></ul> |

Inner Workings

(Similar to before)

  • Lexer (String input to Tokens)
  • Parser (Tokens to NFA)
  • Compiler (NFA to DFA)
  • Optimizer (Simplify DFA (eg. char(a), char(b) -> string(ab)) for better performance)
  • Engine (Matches an input String using the DFA)

Note

Swift treats \r\n as a single Character. Use \n\r to have both.

Resources

View on GitHub
GitHub Stars28
CategoryDevelopment
Updated3y ago
Forks2

Languages

Swift

Security Score

80/100

Audited on Mar 27, 2023

No findings