Zsv
zsv+lib: tabular data swiss-army knife CLI + world's fastest (simd) CSV parser
Install / Use
/learn @liquidaty/ZsvREADME
zsv+lib: the world's fastest (simd) CSV parser, with an extensible CLI
Playground (without sheet viewer command): https://liquidaty.github.io/zsv
zsv+lib is the world's fastest CSV parser library and extensible command-line utility. It achieves high performance using SIMD operations, efficient memory use and other optimization techniques, and can also parse generic-delimited and fixed-width formats, as well as multi-row-span headers.
While zsv is written in C, it can be used in other languages such as ruby. See below for more details.
CLI
The ZSV CLI can be compiled to virtually any target, including
WebAssembly, and offers a variety of commands including select, count,
direct CSV sql, flatten, serialize, 2json conversion, 2db sqlite3
conversion, stack, pretty, 2tsv, compare, paste, overwrite,
check and more.
The ZSV CLI also includes sheet, an in-console interactive
grid viewer that includes basic navigation, filtering, and pivot table with
drill down, and that supports custom extensions:
Installation
brew(MacOS, Linux):brew install zsv
winget(Windows):winget.exe install zsv
npm(parser only),nuget,yum,apt,chocoand more- See INSTALL.md
- Download
- Pre-built binaries and packages for macOS, Windows, Linux and BSD can be downloaded from the Releases page.
- Build
- See BUILD.md to build from source
Language Bindings & Wrappers
Binding contributions are welcome!
| Language | Project | Maintainer | | :--- | :--- | :--- | | Ruby | https://github.com/sebyx07/zsv-ruby | @sebyx07 |
Note: These projects are maintained independently. Please file issues related to specific bindings in their respective repositories.
Playground
An online playground is available as well
(without the sheet feature due to browser limitations)
If you like zsv+lib, do not forget to give it a star! 🌟
Performance
Summary
We compared a number of CSV parsers on speed; memory was also tracked for informational purposes.
The top finalists were: zsv, xan, polars, xsv/qsv and duckdb
Benchmarks use three input profiles: unquoted, sparsely quoted, standard quoted, and non-4180-compliant quoted.
Overall, zsv and xan were the clear top performers in both speed and memory:
- count:
zsvis fastest across all input types - select:
zsvandzanare the fastest, wherexanis faster on unquoted and sparsely quoted, andzsvis faster on standard quoted or non-4180-compliant - non-4180-compliant data:
zsvis fastest across the board (xanandpolarsare N/A for this input category)
Benchmarks
See benchmarks
Detailed benchmark tests have been run on MacOS (arm64) and Linux (x86-64). We would expect similar performance on Windows and other Linux flavors.
Contributions of benchmark results for other os/architecture combinations are welcome-- please open an issue!
Fast parser
zsv includes a SIMD-accelerated fast parser (--parser fast) that uses
branchless prefix-XOR carry propagation for quote state tracking, available on
aarch64 (NEON), x86-64 (AVX2), and x86-64 (SSE2), including Windows (mingw64).
wasm (compiled via emscripten) support will be added next.
The fast parser is only designed for input that uses quoting as defined in RFC 4180
(but does not require other limitations of RFC 4180 such as CRLF line ends).
Like polars and xan, it does not correctly handle non-standard quoting
such as unescaped quotes in unquoted fields (e.g. 12" monitor or say "hello" world).
For such data, use the default compat parser which handles all real-world CSV the same way
spreadsheet programs do.
Parallel parsing
Either the fast or compat parser can be combined with --parallel for multi-threaded parsing:
# Single-threaded
zsv count data.csv # any CSV input
zsv count --parser fast data.csv # only for CSV input using standard quoting
# Multi-threaded parser (uses all available cores)
zsv select --parallel data.csv -- 1 2 3 # any CSV input
zsv select --parser fast --parallel data.csv -- 1 2 3 # only for CSV input using standard quoting
Which "CSV"
"CSV" is an ambiguous term. This library uses, by default, the same definition as Excel (the library and app have various options to change this default behavior); a more accurate description of it would be "UTF8 delimited data parser" insofar as it requires UTF8 input and its options support customization of the delimiter and whether to allow quoting.
In addition, zsv provides a row-level (as well as cell-level) API and provides
"normalized" CSV output (e.g. input of this"iscell1,"thisis,"cell2 becomes
"this""iscell1","thisis,cell2"). Each of these three objectives (Excel
compatibility, row-level API and normalized output) has a measurable performance
impact; conversely, it is possible to achieve-- which a number of other CSV
parsers do-- much faster parsing speeds if any of these requirements (especially
Excel compatibility) are dropped.
Examples of input that does not comply with RFC 4180
The following is a list of all input patterns that are non-compliant with RFC 4180, and how zsv (by default) parses each. It is believed to be comprehensive, please log an issue if you think it is missing any pattern:
|Input Description|Parser treatment|Example input|How example input is parsed|
|--|--|--|--|
|Non-ASCII input, UTF8 BOM| BOM at start of the stream is ignored|(0xEF BB BF)|Ignored|
|Non-ASCII input, valid UTF8|Parsed as UTF8|你,好|cell1 = 你, cell2 = 好|
|Non-ASCII input, invalid UTF8|Parsed as UTF8; any non-compliant bytes are retained, or replaced with specified char|aaa,bXb,ccc where Y is malformed UTF8|cell1 = aaa, cell2 = bXb, cell3 = ccc|
|\n, \r, or \r\n newlines|Any non-quote-captured occurrence of \n, \r, \r\n or \n\r is parsed as a row end|1a,1b,1c\n<br>2a,2b,2c\r<br>3a,3b,3c\n\r<br>4a,4b,4c\r\n<br>5a,"5\nb",5c\n<br>6a,"6b\r","6c"\n<br>7a,7b,7c|Parsed as 7 rows each with 3 cells|
|Unquoted quote|Treated like any other non-delmiter|aaa,b"bb,ccc|Cell 2 value is b"bb, output as CSV "b""bb"|
|Closing quote followed by character other than delimiter (comma) or row end|Treated like any other non-delmiter|"aa"a,"bb"bb"b,ccc|Cell 1 value is aaa, cell2 value is bbbb"b, output as CSV aaa and "bbbb""b"|
|Missing final CRLF|Ignored; end-of-stream is considered end-of-row if not preceded by explicit row terminator|aaa,bbb,ccc<EOF>|Row with 3 cells, same as if input ended with row terminator preceding EOF|
|Row and header contain different number of columns (cells)|Number of cells in each row is independent of other rows|aaa,bbb\n<br>aaa,bbb,ccc|Row 1 = 2 cells; Row 2 = 3 cells|
|Header row contains duplicate cells or embedded newlines|Header rows are parsed the same was as other rows (see NOTE below)|<BOF>"a\na","a\na"|Two cells of a\na|
The above behavior can be altered with various optional flags:
- Header rows can be treated differently if options are used to skip rows and/or use multi-row header span -- see documentation for further detail.
- Quote support can be turned off, to treat quotes just like any other non- delimiter character
- Cell delimiter can be a character other than comma
- Row delimiter can be specfied as CRLF only, in which case a standalone CR or LF is simply part of the cell value, even without quoting
Built-in and extensible features
zsv is an extensible CSV utility, which uses zsvlib, for tasks such as slicing
and dicing, querying with SQL, combining, serializing, flattening,
converting between CSV/JSON/sqlite3 and more.
zsv is streamlined for easy development of custom dynamic extensions.
zsvlib and zsv are written in C, but since zsvlib is a library, and zsv
extensions are just shared libraries, you can extend zsv with your own code in
any programming language, so long as it has been compiled into a shared library
that implements the expected
interface.
Key highlights
- Available as BOTH a library and an application (coming soon: standalone zsvutil library for common helper functions such as csv writer)
- Open-source, permissively licensed
- Handles real-world CSV the same way that spreadsheet programs do (including edge cases). Gracefully handles (and can "clean") real-world data that may be "dirty".
- Runs on macOS (tested on clang/gcc), Linux (gcc), Windows (mingw), BSD (gcc-only) and in-browser (emscripten/wasm)
- High perfo
Related Skills
oracle
344.1kBest practices for using the oracle CLI (prompt + file bundling, engines, sessions, and file attachment patterns).
prose
344.1kOpenProse VM skill pack. Activate on any `prose` command, .prose files, or OpenProse mentions; orchestrates multi-agent workflows.
Command Development
96.8kThis skill should be used when the user asks to "create a slash command", "add a command", "write a custom command", "define command arguments", "use command frontmatter", "organize commands", "create command with file references", "interactive command", "use AskUserQuestion in command", or needs guidance on slash command structure, YAML frontmatter fields, dynamic arguments, bash execution in commands, user interaction patterns, or command development best practices for Claude Code.
Plugin Structure
96.8kThis skill should be used when the user asks to "create a plugin", "scaffold a plugin", "understand plugin structure", "organize plugin components", "set up plugin.json", "use ${CLAUDE_PLUGIN_ROOT}", "add commands/agents/skills/hooks", "configure auto-discovery", or needs guidance on plugin directory layout, manifest configuration, component organization, file naming conventions, or Claude Code plugin architecture best practices.
