StringZilla
Up to 100x faster strings for C, C++, CUDA, Python, Rust, Swift, JS, & Go, leveraging NEON, AVX2, AVX-512, SVE, GPGPU, & SWAR to accelerate search, hashing, sorting, edit distances, sketches, and memory ops š¦
Install / Use
/learn @ashvardanian/StringZillaREADME
StringZilla š¦

Strings are the first fundamental data type every programming language implements in software rather than hardware, so dedicated CPU instructions are rare - and the few that exist are hardly ideal.
That's why most languages lean on the C standard library (libc) for their string operations, which, despite its name, ships its hottest code in hand-tuned assembly.
It does exploit SIMD, but it isn't perfect.
1ļøā£ Even on ubiquitous hardware - over a billion 64-bit ARM CPUs - routines such as strstr and memmem top out at roughly one-third of available throughput.
2ļøā£ SIMD coverage is uneven: fast forward scans don't guarantee speedy reverse searches, hashing and case-mapping is not even part of the standard.
3ļøā£ Many higher-level languages can't rely on libc at all because their strings aren't NUL-terminated - or may even contain embedded zeroes.
That's why StringZilla exists: predictable, high performance on every modern platform, OS, and programming language.
StringZilla is the GodZilla of string libraries, using SIMD and SWAR to accelerate binary and UTF-8 string operations on modern CPUs and GPUs. It delivers up to 10x higher CPU throughput in C, C++, Rust, Python, and other languages, and can be 100x faster than existing GPU kernels, covering a broad range of functionality. It accelerates exact and fuzzy string matching, hashing, edit distance computations, sorting, provides allocation-free lazily-evaluated smart-iterators, and even random-string generators.
- š C: Upgrade LibC's
<string.h>to<stringzilla/stringzilla.h>in C 99 - š C++: Upgrade STL's
<string>to<stringzilla/stringzilla.hpp>in C++ 11 - š§® CUDA: Process in-bulk with
<stringzillas/stringzillas.cuh>in CUDA C++ 17 - š Python: Upgrade your
strto fasterStr - š¦ Rust: Use the
StringZillatraits crate - 𦫠Go: Use the
StringZillacGo module - š Swift: Use the
String+StringZillaextension - šØ JavaScript: Use the
StringZillalibrary - š Shell: Accelerate common CLI tools with
sz-prefix - š Researcher? Jump to Algorithms & Design Decisions
- š” Thinking to contribute? Look for "good first issues"
- š¤ And check the guide to set up the environment
- Want more bindings or features? Let me know!
Who is this for?
- For data-engineers parsing large datasets, like the CommonCrawl, RedPajama, or LAION.
- For software engineers optimizing strings in their apps and services.
- For bioinformaticians and search engineers looking for edit-distances for USearch.
- For DBMS devs, optimizing
LIKE,ORDER BY, andGROUP BYoperations. - For hardware designers, needing a SWAR baseline for string-processing functionality.
- For students studying SIMD/SWAR applications to non-data-parallel operations.
Performance
<table> <tr> <th align="center" width="25%">C</th> <th align="center" width="25%">C++</th> <th align="center" width="25%">Python</th> <th align="center" width="25%">StringZilla</th> </tr> <!-- Unicode case-folding --> <tr> <td colspan="4" align="center">Unicode case-folding, expanding characters like <code>Ć</code> ā <code>ss</code></td> </tr> <tr> <td align="center">āŖ</td> <td align="center">āŖ</td> <td align="center"> <code>.casefold</code><br/> <span style="color:#ABABAB;">x86:</span> <b>0.4</b> GB/s </td> <td align="center"> <code>sz.utf8_case_fold</code><br/> <span style="color:#ABABAB;">x86:</span> <b>1.3</b> GB/s </td> </tr> <!-- Unicode case-insensitive search --> <tr> <td colspan="4" align="center">Unicode case-insensitive substring search</td> </tr> <tr> <td align="center">āŖ</td> <td align="center">āŖ</td> <td align="center"> <code>icu.StringSearch</code><br/> <span style="color:#ABABAB;">x86:</span> <b>0.02</b> GB/s </td> <td align="center"> <code>utf8_case_insensitive_find</code><br/> <span style="color:#ABABAB;">x86:</span> <b>3.0</b> GB/s </td> </tr> <!-- Substrings, normal order --> <tr> <td colspan="4" align="center">find the first occurrence of a random word from text, ā 5 bytes long</td> </tr> <tr> <td align="center"> <code>strstr</code> <sup>1</sup><br/> <span style="color:#ABABAB;">x86:</span> <b>7.4</b> · <span style="color:#ABABAB;">arm:</span> <b>2.0</b> GB/s </td> <td align="center"> <code>.find</code><br/> <span style="color:#ABABAB;">x86:</span> <b>2.9</b> · <span style="color:#ABABAB;">arm:</span> <b>1.6</b> GB/s </td> <td align="center"> <code>.find</code><br/> <span style="color:#ABABAB;">x86:</span> <b>1.1</b> · <span style="color:#ABABAB;">arm:</span> <b>0.6</b> GB/s </td> <td align="center"> <code>sz_find</code><br/> <span style="color:#ABABAB;">x86:</span> <b>10.6</b> · <span style="color:#ABABAB;">arm:</span> <b>7.1</b> GB/s </td> </tr> <!-- Substrings, reverse order --> <tr> <td colspan="4" align="center">find the last occurrence of a random word from text, ā 5 bytes long</td> </tr> <tr> <td align="center">āŖ</td> <td align="center"> <code>.rfind</code><br/> <span style="color:#ABABAB;">x86:</span> <b>0.5</b> · <span style="color:#ABABAB;">arm:</span> <b>0.4</b> GB/s </td> <td align="center"> <code>.rfind</code><br/> <span style="color:#ABABAB;">x86:</span> <b>0.9</b> · <span style="color:#ABABAB;">arm:</span> <b>0.5</b> GB/s </td> <td align="center"> <code>sz_rfind</code><br/> <span style="color:#ABABAB;">x86:</span> <b>10.8</b> · <span style="color:#ABABAB;">arm:</span> <b>6.7</b> GB/s </td> </tr> <!-- Characters, normal order --> <tr> <td colspan="4" align="center">split lines separated by <code>\n</code> or <code>\r</code> <sup>2</sup></td> </tr> <tr> <td align="center"> <code>strcspn</code> <sup>1</sup><br/> <span style="color:#ABABAB;">x86:</span> <b>5.42</b> · <span style="color:#ABABAB;">arm:</span> <b>2.19</b> GB/s </td> <td align="center"> <code>.find_first_of</code><br/> <span style="color:#ABABAB;">x86:</span> <b>0.59</b> · <span style="color:#ABABAB;">arm:</span> <b>0.46</b> GB/s </td> <td align="center"> <code>re.finditer</code><br/> <span style="color:#ABABAB;">x86:</span> <b>0.06</b> · <span style="color:#ABABAB;">arm:</span> <b>0.02</b> GB/s </td> <td align="center"> <code>sz_find_byteset</code><br/> <span style="color:#ABABAB;">x86:</span> <b>4.08</b> · <span style="color:#ABABAB;">arm:</span> <b>3.22</b> GB/s </td> </tr> <!-- Characters, reverse order --> <tr> <td colspan="4" align="center">find the last occurrence of any of 6 whitespaces <sup>2</sup></td> </tr> <tr> <td align="center">āŖ</td> <td align="center"> <code>.find_last_of</code><br/> <span style="color:#ABABAB;">x86:</span> <b>0.25</b> · <span style="color:#ABABAB;">arm:</span> <b>0.25</b> GB/s </td> <td align="center">āŖ</td> <td align="center"> <code>sz_rfind_byteset</code><br/> <span style="color:#ABABAB;">x86:</span> <b>0.43</b> · <span style="color:#ABABAB;">arm:</span> <b>0.23</b> GB/s </td> </tr> <!-- Random Generation --> <tr> <td colspan="4" align="center">Random string from a given alphabet, 20 bytes long <sup>3</sup></td> </tr> <tr> <td align="center"> <code>rand() % n</code><br/> <span style="color:#ABABAB;">x86:</span> <b>18.0</b> · <span style="color:#ABABAB;">arm:</spaRelated Skills
node-connect
335.2kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
82.5kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
335.2kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
commit-push-pr
82.5kCommit, push, and open a PR
