SkillAgentSearch skills...

StringZilla

Up to 100x faster strings for C, C++, CUDA, Python, Rust, Swift, JS, & Go, leveraging NEON, AVX2, AVX-512, SVE, GPGPU, & SWAR to accelerate search, hashing, sorting, edit distances, sketches, and memory ops šŸ¦–

Install / Use

/learn @ashvardanian/StringZilla

README

StringZilla šŸ¦–

StringZilla banner

Strings are the first fundamental data type every programming language implements in software rather than hardware, so dedicated CPU instructions are rare - and the few that exist are hardly ideal. That's why most languages lean on the C standard library (libc) for their string operations, which, despite its name, ships its hottest code in hand-tuned assembly. It does exploit SIMD, but it isn't perfect. 1ļøāƒ£ Even on ubiquitous hardware - over a billion 64-bit ARM CPUs - routines such as strstr and memmem top out at roughly one-third of available throughput. 2ļøāƒ£ SIMD coverage is uneven: fast forward scans don't guarantee speedy reverse searches, hashing and case-mapping is not even part of the standard. 3ļøāƒ£ Many higher-level languages can't rely on libc at all because their strings aren't NUL-terminated - or may even contain embedded zeroes. That's why StringZilla exists: predictable, high performance on every modern platform, OS, and programming language.

StringZilla Python installs StringZilla Rust installs StringZilla code size

<!-- Those badges often stay in stale state - greyed out. Consider enabling them later. [![Ubuntu status](https://img.shields.io/github/checks-status/ashvardanian/StringZilla/main?checkName=Linux%20CI&label=Ubuntu)](https://github.com/ashvardanian/StringZilla/actions/workflows/release.yml) [![Windows status](https://img.shields.io/github/checks-status/ashvardanian/StringZilla/main?checkName=Windows%20CI&label=Windows)](https://github.com/ashvardanian/StringZilla/actions/workflows/release.yml) [![macOS status](https://img.shields.io/github/checks-status/ashvardanian/StringZilla/main?checkName=macOS%20CI&label=macOS)](https://github.com/ashvardanian/StringZilla/actions/workflows/release.yml) -->

StringZilla is the GodZilla of string libraries, using SIMD and SWAR to accelerate binary and UTF-8 string operations on modern CPUs and GPUs. It delivers up to 10x higher CPU throughput in C, C++, Rust, Python, and other languages, and can be 100x faster than existing GPU kernels, covering a broad range of functionality. It accelerates exact and fuzzy string matching, hashing, edit distance computations, sorting, provides allocation-free lazily-evaluated smart-iterators, and even random-string generators.

  • šŸ‚ C: Upgrade LibC's <string.h> to <stringzilla/stringzilla.h> in C 99
  • šŸ‰ C++: Upgrade STL's <string> to <stringzilla/stringzilla.hpp> in C++ 11
  • 🧮 CUDA: Process in-bulk with <stringzillas/stringzillas.cuh> in CUDA C++ 17
  • šŸ Python: Upgrade your str to faster Str
  • šŸ¦€ Rust: Use the StringZilla traits crate
  • 🦫 Go: Use the StringZilla cGo module
  • šŸŽ Swift: Use the String+StringZilla extension
  • 🟨 JavaScript: Use the StringZilla library
  • 🐚 Shell: Accelerate common CLI tools with sz- prefix
  • šŸ“š Researcher? Jump to Algorithms & Design Decisions
  • šŸ’” Thinking to contribute? Look for "good first issues"
  • šŸ¤ And check the guide to set up the environment
  • Want more bindings or features? Let me know!

Who is this for?

  • For data-engineers parsing large datasets, like the CommonCrawl, RedPajama, or LAION.
  • For software engineers optimizing strings in their apps and services.
  • For bioinformaticians and search engineers looking for edit-distances for USearch.
  • For DBMS devs, optimizing LIKE, ORDER BY, and GROUP BY operations.
  • For hardware designers, needing a SWAR baseline for string-processing functionality.
  • For students studying SIMD/SWAR applications to non-data-parallel operations.

Performance

<table> <tr> <th align="center" width="25%">C</th> <th align="center" width="25%">C++</th> <th align="center" width="25%">Python</th> <th align="center" width="25%">StringZilla</th> </tr> <!-- Unicode case-folding --> <tr> <td colspan="4" align="center">Unicode case-folding, expanding characters like <code>ß</code> → <code>ss</code></td> </tr> <tr> <td align="center">⚪</td> <td align="center">⚪</td> <td align="center"> <code>.casefold</code><br/> <span style="color:#ABABAB;">x86:</span> <b>0.4</b> GB/s </td> <td align="center"> <code>sz.utf8_case_fold</code><br/> <span style="color:#ABABAB;">x86:</span> <b>1.3</b> GB/s </td> </tr> <!-- Unicode case-insensitive search --> <tr> <td colspan="4" align="center">Unicode case-insensitive substring search</td> </tr> <tr> <td align="center">⚪</td> <td align="center">⚪</td> <td align="center"> <code>icu.StringSearch</code><br/> <span style="color:#ABABAB;">x86:</span> <b>0.02</b> GB/s </td> <td align="center"> <code>utf8_case_insensitive_find</code><br/> <span style="color:#ABABAB;">x86:</span> <b>3.0</b> GB/s </td> </tr> <!-- Substrings, normal order --> <tr> <td colspan="4" align="center">find the first occurrence of a random word from text, ≅ 5 bytes long</td> </tr> <tr> <td align="center"> <code>strstr</code> <sup>1</sup><br/> <span style="color:#ABABAB;">x86:</span> <b>7.4</b> &centerdot; <span style="color:#ABABAB;">arm:</span> <b>2.0</b> GB/s </td> <td align="center"> <code>.find</code><br/> <span style="color:#ABABAB;">x86:</span> <b>2.9</b> &centerdot; <span style="color:#ABABAB;">arm:</span> <b>1.6</b> GB/s </td> <td align="center"> <code>.find</code><br/> <span style="color:#ABABAB;">x86:</span> <b>1.1</b> &centerdot; <span style="color:#ABABAB;">arm:</span> <b>0.6</b> GB/s </td> <td align="center"> <code>sz_find</code><br/> <span style="color:#ABABAB;">x86:</span> <b>10.6</b> &centerdot; <span style="color:#ABABAB;">arm:</span> <b>7.1</b> GB/s </td> </tr> <!-- Substrings, reverse order --> <tr> <td colspan="4" align="center">find the last occurrence of a random word from text, ≅ 5 bytes long</td> </tr> <tr> <td align="center">⚪</td> <td align="center"> <code>.rfind</code><br/> <span style="color:#ABABAB;">x86:</span> <b>0.5</b> &centerdot; <span style="color:#ABABAB;">arm:</span> <b>0.4</b> GB/s </td> <td align="center"> <code>.rfind</code><br/> <span style="color:#ABABAB;">x86:</span> <b>0.9</b> &centerdot; <span style="color:#ABABAB;">arm:</span> <b>0.5</b> GB/s </td> <td align="center"> <code>sz_rfind</code><br/> <span style="color:#ABABAB;">x86:</span> <b>10.8</b> &centerdot; <span style="color:#ABABAB;">arm:</span> <b>6.7</b> GB/s </td> </tr> <!-- Characters, normal order --> <tr> <td colspan="4" align="center">split lines separated by <code>\n</code> or <code>\r</code> <sup>2</sup></td> </tr> <tr> <td align="center"> <code>strcspn</code> <sup>1</sup><br/> <span style="color:#ABABAB;">x86:</span> <b>5.42</b> &centerdot; <span style="color:#ABABAB;">arm:</span> <b>2.19</b> GB/s </td> <td align="center"> <code>.find_first_of</code><br/> <span style="color:#ABABAB;">x86:</span> <b>0.59</b> &centerdot; <span style="color:#ABABAB;">arm:</span> <b>0.46</b> GB/s </td> <td align="center"> <code>re.finditer</code><br/> <span style="color:#ABABAB;">x86:</span> <b>0.06</b> &centerdot; <span style="color:#ABABAB;">arm:</span> <b>0.02</b> GB/s </td> <td align="center"> <code>sz_find_byteset</code><br/> <span style="color:#ABABAB;">x86:</span> <b>4.08</b> &centerdot; <span style="color:#ABABAB;">arm:</span> <b>3.22</b> GB/s </td> </tr> <!-- Characters, reverse order --> <tr> <td colspan="4" align="center">find the last occurrence of any of 6 whitespaces <sup>2</sup></td> </tr> <tr> <td align="center">⚪</td> <td align="center"> <code>.find_last_of</code><br/> <span style="color:#ABABAB;">x86:</span> <b>0.25</b> &centerdot; <span style="color:#ABABAB;">arm:</span> <b>0.25</b> GB/s </td> <td align="center">⚪</td> <td align="center"> <code>sz_rfind_byteset</code><br/> <span style="color:#ABABAB;">x86:</span> <b>0.43</b> &centerdot; <span style="color:#ABABAB;">arm:</span> <b>0.23</b> GB/s </td> </tr> <!-- Random Generation --> <tr> <td colspan="4" align="center">Random string from a given alphabet, 20 bytes long <sup>3</sup></td> </tr> <tr> <td align="center"> <code>rand() % n</code><br/> <span style="color:#ABABAB;">x86:</span> <b>18.0</b> &centerdot; <span style="color:#ABABAB;">arm:</spa

Related Skills

View on GitHub
GitHub Stars3.4k
CategoryDevelopment
Updated1d ago
Forks120

Languages

C

Security Score

100/100

Audited on Mar 24, 2026

No findings