SkillAgentSearch skills...

Zsv

zsv+lib: tabular data swiss-army knife CLI + world's fastest (simd) CSV parser

Install / Use

/learn @liquidaty/Zsv

README

zsv+lib: the world's fastest (simd) CSV parser, with an extensible CLI

lib + CLI: ci GitHub release (latest by date) GitHub all releases (downloads) License

npm: NPM Version NPM Install Size

Playground (without sheet viewer command): https://liquidaty.github.io/zsv

zsv+lib is the world's fastest CSV parser library and extensible command-line utility. It achieves high performance using SIMD operations, efficient memory use and other optimization techniques, and can also parse generic-delimited and fixed-width formats, as well as multi-row-span headers.

While zsv is written in C, it can be used in other languages such as ruby. See below for more details.

CLI

The ZSV CLI can be compiled to virtually any target, including WebAssembly, and offers a variety of commands including select, count, direct CSV sql, flatten, serialize, 2json conversion, 2db sqlite3 conversion, stack, pretty, 2tsv, compare, paste, overwrite, check and more.

The ZSV CLI also includes sheet, an in-console interactive grid viewer that includes basic navigation, filtering, and pivot table with drill down, and that supports custom extensions:

<img src="https://github.com/user-attachments/assets/c2ae32a3-48c4-499d-8ef7-7748687bd24f" width="50%">

Installation

  • brew (MacOS, Linux):
    • brew install zsv
  • winget (Windows):
    • winget.exe install zsv
  • npm (parser only), nuget, yum, apt, choco and more
  • Download
    • Pre-built binaries and packages for macOS, Windows, Linux and BSD can be downloaded from the Releases page.
  • Build

Language Bindings & Wrappers

Binding contributions are welcome!

| Language | Project | Maintainer | | :--- | :--- | :--- | | Ruby | https://github.com/sebyx07/zsv-ruby | @sebyx07 |

Note: These projects are maintained independently. Please file issues related to specific bindings in their respective repositories.

Playground

An online playground is available as well (without the sheet feature due to browser limitations)

If you like zsv+lib, do not forget to give it a star! 🌟

Performance

Summary

We compared a number of CSV parsers on speed; memory was also tracked for informational purposes. The top finalists were: zsv, xan, polars, xsv/qsv and duckdb

Benchmarks use three input profiles: unquoted, sparsely quoted, standard quoted, and non-4180-compliant quoted.

Overall, zsv and xan were the clear top performers in both speed and memory:

  • count: zsv is fastest across all input types
  • select: zsv and zan are the fastest, where xan is faster on unquoted and sparsely quoted, and zsv is faster on standard quoted or non-4180-compliant
  • non-4180-compliant data: zsv is fastest across the board (xan and polars are N/A for this input category)

Benchmarks

See benchmarks

Detailed benchmark tests have been run on MacOS (arm64) and Linux (x86-64). We would expect similar performance on Windows and other Linux flavors.

Contributions of benchmark results for other os/architecture combinations are welcome-- please open an issue!

Fast parser

zsv includes a SIMD-accelerated fast parser (--parser fast) that uses branchless prefix-XOR carry propagation for quote state tracking, available on aarch64 (NEON), x86-64 (AVX2), and x86-64 (SSE2), including Windows (mingw64). wasm (compiled via emscripten) support will be added next.

The fast parser is only designed for input that uses quoting as defined in RFC 4180 (but does not require other limitations of RFC 4180 such as CRLF line ends). Like polars and xan, it does not correctly handle non-standard quoting such as unescaped quotes in unquoted fields (e.g. 12" monitor or say "hello" world). For such data, use the default compat parser which handles all real-world CSV the same way spreadsheet programs do.

Parallel parsing

Either the fast or compat parser can be combined with --parallel for multi-threaded parsing:

# Single-threaded
zsv count data.csv               # any CSV input
zsv count --parser fast data.csv # only for CSV input using standard quoting

# Multi-threaded parser (uses all available cores)
zsv select --parallel data.csv -- 1 2 3               # any CSV input
zsv select --parser fast --parallel data.csv -- 1 2 3 # only for CSV input using standard quoting

Which "CSV"

"CSV" is an ambiguous term. This library uses, by default, the same definition as Excel (the library and app have various options to change this default behavior); a more accurate description of it would be "UTF8 delimited data parser" insofar as it requires UTF8 input and its options support customization of the delimiter and whether to allow quoting.

In addition, zsv provides a row-level (as well as cell-level) API and provides "normalized" CSV output (e.g. input of this"iscell1,"thisis,"cell2 becomes "this""iscell1","thisis,cell2"). Each of these three objectives (Excel compatibility, row-level API and normalized output) has a measurable performance impact; conversely, it is possible to achieve-- which a number of other CSV parsers do-- much faster parsing speeds if any of these requirements (especially Excel compatibility) are dropped.

Examples of input that does not comply with RFC 4180

The following is a list of all input patterns that are non-compliant with RFC 4180, and how zsv (by default) parses each. It is believed to be comprehensive, please log an issue if you think it is missing any pattern:

|Input Description|Parser treatment|Example input|How example input is parsed| |--|--|--|--| |Non-ASCII input, UTF8 BOM| BOM at start of the stream is ignored|(0xEF BB BF)|Ignored| |Non-ASCII input, valid UTF8|Parsed as UTF8|你,好|cell1 = 你, cell2 = 好| |Non-ASCII input, invalid UTF8|Parsed as UTF8; any non-compliant bytes are retained, or replaced with specified char|aaa,bXb,ccc where Y is malformed UTF8|cell1 = aaa, cell2 = bXb, cell3 = ccc| |\n, \r, or \r\n newlines|Any non-quote-captured occurrence of \n, \r, \r\n or \n\r is parsed as a row end|1a,1b,1c\n<br>2a,2b,2c\r<br>3a,3b,3c\n\r<br>4a,4b,4c\r\n<br>5a,"5\nb",5c\n<br>6a,"6b\r","6c"\n<br>7a,7b,7c|Parsed as 7 rows each with 3 cells| |Unquoted quote|Treated like any other non-delmiter|aaa,b"bb,ccc|Cell 2 value is b"bb, output as CSV "b""bb"| |Closing quote followed by character other than delimiter (comma) or row end|Treated like any other non-delmiter|"aa"a,"bb"bb"b,ccc|Cell 1 value is aaa, cell2 value is bbbb"b, output as CSV aaa and "bbbb""b"| |Missing final CRLF|Ignored; end-of-stream is considered end-of-row if not preceded by explicit row terminator|aaa,bbb,ccc<EOF>|Row with 3 cells, same as if input ended with row terminator preceding EOF| |Row and header contain different number of columns (cells)|Number of cells in each row is independent of other rows|aaa,bbb\n<br>aaa,bbb,ccc|Row 1 = 2 cells; Row 2 = 3 cells| |Header row contains duplicate cells or embedded newlines|Header rows are parsed the same was as other rows (see NOTE below)|<BOF>"a\na","a\na"|Two cells of a\na|

The above behavior can be altered with various optional flags:

  • Header rows can be treated differently if options are used to skip rows and/or use multi-row header span -- see documentation for further detail.
  • Quote support can be turned off, to treat quotes just like any other non- delimiter character
  • Cell delimiter can be a character other than comma
  • Row delimiter can be specfied as CRLF only, in which case a standalone CR or LF is simply part of the cell value, even without quoting

Built-in and extensible features

zsv is an extensible CSV utility, which uses zsvlib, for tasks such as slicing and dicing, querying with SQL, combining, serializing, flattening, converting between CSV/JSON/sqlite3 and more.

zsv is streamlined for easy development of custom dynamic extensions.

zsvlib and zsv are written in C, but since zsvlib is a library, and zsv extensions are just shared libraries, you can extend zsv with your own code in any programming language, so long as it has been compiled into a shared library that implements the expected interface.

Key highlights

  • Available as BOTH a library and an application (coming soon: standalone zsvutil library for common helper functions such as csv writer)
  • Open-source, permissively licensed
  • Handles real-world CSV the same way that spreadsheet programs do (including edge cases). Gracefully handles (and can "clean") real-world data that may be "dirty".
  • Runs on macOS (tested on clang/gcc), Linux (gcc), Windows (mingw), BSD (gcc-only) and in-browser (emscripten/wasm)
  • High perfo

Related Skills

View on GitHub
GitHub Stars378
CategoryData
Updated2d ago
Forks19

Languages

C

Security Score

100/100

Audited on Mar 30, 2026

No findings