RegXwild
⏱ Superfast ^Advanced wildcards++? | Unique algorithms that was implemented on native unmanaged C++ but easily accessible in .NET via Conari (with caching of 0x29 opcodes +optimizations) etc.
Install / Use
/learn @3F/RegXwildREADME
regXwild
⏱ Superfast ^Advanced wildcards++? *,|,?,^,$,+,#,>,++??,##??,>c in addition to slow regex engines and more.
✔ regex-like quantifiers, amazing meta symbols, and speed...
Unique algorithms that was implemented on native unmanaged C++ but easily accessible in .NET through Conari (recommended due to caching of 0x29 opcodes + related optimizations), and others such as python etc.
Samples ⏯ | regXwild filter | n ----------------------|----------------------|--------- number = '1271'; | number = '????'; | 0 - 4 year = '2020'; | '##'|'####' | 2 | 4 year = '20'; | = '##??' | 2 | 4 number = 888; | number = +??; | 1 - 3
Samples ⏯ | regXwild filter ----------------------|---------------------- everything is ok | ^everything*ok$ systems | system? systems | sys###s A new 'X1' project | ^A*'+' pro?ect professional system | pro*system regXwild in action | pro?ect$|open*source+act|^regXwild
Why regXwild ?
It was designed to be faster than just fast for features that usually go beyond the typical wildcards. Seriously, We love regex, I love, You love; 2013 far behind but regXwild still relevant for speed and powerful wildcards-like features, such as ##?? (which means 2 or 4) ...
🔍 Easy to start
Unmanaged native C++ or managed .NET project. It doesn't matter, just use it:
C++
#include <regXwild.h>
using namespace net::r_eg::regXwild;
...
EssRxW rxw;
if(rxw.match(_T("regXwild"), _T("reg?wild"))) {
// ...
}
C# if Conari
using dynamic l = new ConariX("regXwild.dll");
...
if(l.match<bool>("regXwild", "reg?wild")) {
// ...
}
🏄 Amazing meta symbols
ESS version (advanced EXT version)
metasymbol | meaning
-----------|----------------
* | {0, ~}
| | str1 or str2 or ...
? | {0, 1}, ??? {0, 3}, ...
^ | [str... or [str1... |[str2...
$ | ...str] or ...str1]| ...str2]
+ | {1, ~}, +++ {3, ~}, ...
# | {1}, ## {2}, ### {3}, ...
> | Legacy > (F_LEGACY_ANYSP = 0x008) as [^/]*str | [^/]*$
>c | 1.4+ Modern > as [^c]*str | [^c]*$
EXT version (more simplified than ESS)
metasymbol | meaning -----------|---------------- * | {0, ~} > | as [^/\]+ | | str1 or str2 or ... ? | {0, 1}, ??? {0, 3}, ...
🧮 Quantifiers
regex | regXwild | n ----------------|------------|--------- .* | * | 0+ .+ | + | 1+ .? | ? | 0 | 1 .{1} | # | 1 .{2} | ## | 2 .{2, } | ++ | 2+ .{0, 2} | ?? | 0 - 2 .{2, 4} | ++?? | 2 - 4 (?:.{2}|.{4}) | ##?? | 2 | 4 .{3, 4} | +++? | 3 - 4 (?:.{1}|.{3}) | #?? | 1 | 3
and similar ...
Play with our actual Unit-Tests.
🚀 Awesome speed
- ~2000 times faster when C++.
- For .NET (including modern .NET Core), Conari provides optional caching of 0x29 opcodes (Calli) and more to get similar to C++ result as possible.
Match result and Replacements
1.4+
EssRxW::MatchResult m;
rxw.match
(
_T("number = '8888'; //TODO: up"),
_T("'+'"),
EssRxW::EngineOptions::F_MATCH_RESULT,
&m
);
//m.start = 9
//m.end = 15
...
input.replace(m.start, m.end - m.start, _T("'9777'"));
tstring str = _T("year = 2021; dd = 17;");
...
if(rxw.replace(str, _T(" ##;"), _T(" 00;"))) {
// year = 2021; dd = 00;
}
🍰 Open and Free
Open Source project; MIT License, Enjoy 🎉
License
Copyright (c) 2013-2021 Denis Kuzmin <x-3F@outlook.com> github/3F
regXwild contributors: https://github.com/3F/regXwild/graphs/contributors
We're waiting for your awesome contributions!
Speed
Procedure of testing
- Use the
algosubproject as tester of the main algorithms (Release cfg - x32 & x64) - In general, calculation is simple and uses average as
i = (t2 - t1); (sum(i) / n)where:- i - one iteration for searching by filter. Represents the delta of time
t2 - t1 - n - the number of repeats of the matching to get average.
- i - one iteration for searching by filter. Represents the delta of time
e.g.:
{
Meter meter;
int results = 0;
for(int total = 0; total < average; ++total)
{
meter.start();
for(int i = 0; i < iterations; ++i)
{
if((alg.*method)(data, filter)) {
//...
}
}
results += meter.delta();
}
TRACE((results / average) << "ms");
}
for regex results it also prepares additional basic_regex from filter, but of course, only one for all iterations:
meter.start();
auto rfilter = tregex(
filter,
regex_constants::icase | regex_constants::optimize
);
results += meter.delta();
...
Please note:
- +icase means ignore case sensitivity when matching the filter(pattern) within the searched string, i.e.
ignoreCase = true. Without this, everything will be much faster of course. That is, icase always adds complexity. - Below, MultiByte can be faster than Unicode (for the same platform and the same way of module use) but it depends on specific architecture and can be about ~2 times faster when native C++, and about ~4 times faster when .NET + Conari and related.
- The results below can be different on different machines. You need only look at the difference (in milliseconds) between algorithms for a specific target.
- To calculate the data, as in the table below, you need execute
algo.exe
Sample of speed for Unicode
340 Unicode Symbols and 10^4 iterations (340 x 10000); Filter: L"nime**haru*02*Magica"
algorithms (see impl. from algo) | +icase [x32]| +icase [x64]
------------------------------------------|-------------|-------------
Find + Find | ~58ms | ~44ms
Iterator + Find | ~57ms | ~46ms
Getline + Find | ~59ms | ~54ms
Iterator + Substr | ~165ms | ~132ms
Iterator + Iterator | ~136ms | ~118ms
main :: based on Iterator + Find | ~53ms | ~45ms
| |
Final algorithm - EXT version: | ~50ms | ~26ms
Final algorithm - ESS version: | ~50ms | ~27ms
| |
regexp-c++11(regex_search) | ~59309ms | ~53334ms
regexp-c++11(only as ^match$ like a '==') | ~12ms | ~5ms
regexp-c++11(regex_match with endings .*) | ~59503ms | ~53817ms
ESS vs EXT
350 Unicode Symbols and 10^4 iterations (350 x 10000);
Operation (+icase) | EXT [x32] | ESS [x32] | EXT [x64] | ESS [x64] ----------------------|------------|------------|------------|------------ ANY | ~54ms | ~55ms | ~32ms | ~34ms ANYSP | ~60ms | ~59ms | ~37ms | ~38ms ONE | ~56ms | ~56ms | ~33ms | ~35ms SPLIT | ~92ms | ~94ms | ~58ms | ~63ms BEGIN | --- | ~38ms | --- | ~19ms END | --- | ~39ms | --- | ~21ms MORE | --- | ~44ms | --- | ~23ms SINGLE | --- | ~43ms | --- | ~22ms
For .NET users through Conari engine:
Same test Data & Filter: 10^4 iterations
Release cfg; x32 or x64 regXwild (Unicode)
Attention: For more speed you need upgrading to Conari 1.3 or higher !
algorithms (see impl. from snet) | +icase [x32] | +icase [x64] |
--------------------------------------------|--------------|--------------|---
regXwild via Conari v1.2 (Lambda) - ESS | ~1032ms | ~1418ms | x
regXwild via Conari v1.2 (DLR) - ESS | ~1238ms | ~1609ms | x
regXwild via Conari v1.2 (Lambda) - EXT | ~1117ms | ~1457ms | x
regXwild via Conari v1.2 (DLR) - EXT | ~1246ms | ~1601ms | x
| | |
regXwild via Conari v1.3 (Lambda) - ESS | ~58ms | ~42ms | <<
regXwild via Cona
