Pxi
š§ pxi (pixie) is a small, fast, and magical command-line data processor similar to jq, mlr, and awk.
Install / Use
/learn @Yord/PxiREADME
![pxi teaser][teaser]
š§pxi (pixie) is a small, fast, and magical command-line data processor similar to jq, mlr, and awk.
[![node version][shield-node]][node] [![npm version][shield-npm]][npm-package] [![license][shield-license]][license] [![PRs Welcome][shield-prs]][contribute] [![linux unit tests status][shield-unit-tests-linux]][actions] [![macos unit tests status][shield-unit-tests-macos]][actions] [![windows unit tests status][shield-unit-tests-windows]][actions]
Installation
Installation is done using [npm][npm-install].
$ npm i -g pxi
Try pxi --help to see if the installation was successful.
Features
- š§ Small: Pixie [does one thing and does it well][unix-philosophy] (processing data with JavaScript).
- :zap: Fast:
pxiis as fast asgawk, 3x faster thanjqandmlr, and 15x faster thanfx. - :sparkles: Magical: It is trivial to write your own ~~spells~~ plugins.
- :smile_cat: Playful: Opt-in to more data formats by installing plugins.
- :tada: Versatile: Use Ramda, Lodash and any other JavaScript library to process data on the command-line.
- :heart: Loving: Pixie is made with love and encourages a positive and welcoming environment.
Getting Started
<details open> <summary> Pixie reads in big structured text files, transforms them with JavaScript functions, and writes them back to disk. The usage examples in this section are based on the following large JSONL file. Inspect the examples by clicking on them! <p>$ head -5 2019.jsonl # 2.6GB, 31,536,000 lines
</p>
</summary>
{"time":1546300800,"year":2019,"month":1,"day":1,"hours":0,"minutes":0,"seconds":0}
{"time":1546300801,"year":2019,"month":1,"day":1,"hours":0,"minutes":0,"seconds":1}
{"time":1546300802,"year":2019,"month":1,"day":1,"hours":0,"minutes":0,"seconds":2}
{"time":1546300803,"year":2019,"month":1,"day":1,"hours":0,"minutes":0,"seconds":3}
{"time":1546300804,"year":2019,"month":1,"day":1,"hours":0,"minutes":0,"seconds":4}
</details>
<details>
<summary>
Execute any JavaScript function:
<p>
$ pxi "json => json.time" < 2019.jsonl
$ pxi "({time}) => time" < 2019.jsonl
</p>
</summary>
You may use JavaScript arrow functions, destructuring, spreading, and any other feature of your current NodeJS version.
1546300800
1546300801
1546300802
1546300803
1546300804
</details>
<details>
<summary>
Convert between JSON, CSV, SSV, and TSV:
<p>
$ pxi --from json --to csv < 2019.jsonl > 2019.csv
$ pxi --deserializer json --serializer csv < 2019.jsonl > 2019.csv
$ pxi -d json -s csv < 2019.jsonl > 2019.csv
</p>
</summary>
Users may extend pixie with (third-party) plugins for many more data formats.
See the [.pxi module section][pxi-module] on how to do that and the plugins section for a list.
Pixie deserializes data into JSON, applies functions, and serializes JSON to another format.
It offers the telling aliases --from and --to alternative to --deserializer and --serializer.
time,year,month,day,hours,minutes,seconds
1546300800,2019,1,1,0,0,0
1546300801,2019,1,1,0,0,1
1546300802,2019,1,1,0,0,2
1546300803,2019,1,1,0,0,3
</details>
<details>
<summary>
Use Ramda, Lodash or any other JavaScript library:
<p>
$ pxi "o(obj => _.omit(obj, ['seconds']), evolve({time: parseInt}))" --from csv < 2019.csv
</p>
</summary>
Pixie may use any JavaScript library, including Ramda and Lodash.
Read the [.pxi module section][pxi-module] to learn more.
{"time":1546300800,"year":"2019","month":"1","day":"1","hours":"0","minutes":"0"}
{"time":1546300801,"year":"2019","month":"1","day":"1","hours":"0","minutes":"0"}
{"time":1546300802,"year":"2019","month":"1","day":"1","hours":"0","minutes":"0"}
{"time":1546300803,"year":"2019","month":"1","day":"1","hours":"0","minutes":"0"}
{"time":1546300804,"year":"2019","month":"1","day":"1","hours":"0","minutes":"0"}
</details>
<details>
<summary>
Process data streams from REST APIs and other sources and pipe pixie's output to other commands:
<p>
$ curl -s "https://swapi.co/api/films/" |
pxi 'json => json.results' --with flatMap --keep '["episode_id", "title"]' |
sort
</p>
</summary>
Pixie follows the [unix philosophy][unix-philosophy]: It does one thing (processing structured data), and does it well. It is written to work together with other programs and it handles text streams because that is a universal interface.
{"episode_id":1,"title":"The Phantom Menace"}
{"episode_id":2,"title":"Attack of the Clones"}
{"episode_id":3,"title":"Revenge of the Sith"}
{"episode_id":4,"title":"A New Hope"}
{"episode_id":5,"title":"The Empire Strikes Back"}
{"episode_id":6,"title":"Return of the Jedi"}
{"episode_id":7,"title":"The Force Awakens"}
</details>
<details>
<summary>
Use pixie's ssv deserializer to work with command line output:
<p>
$ ls -ahl / | pxi '([,,,,size,,,,file]) => ({size, file})' --from ssv
</p>
</summary>
Pixie's space-separated values deserializer makes it very easy to work with the output of other commands. Array destructuring is especially helpful in this area.
{"size":"704B","file":"."}
{"size":"704B","file":".."}
{"size":"1.2K","file":"bin"}
{"size":"4.4K","file":"dev"}
{"size":"11B","file":"etc"}
{"size":"25B","file":"home"}
{"size":"64B","file":"opt"}
{"size":"192B","file":"private"}
{"size":"2.0K","file":"sbin"}
{"size":"11B","file":"tmp"}
{"size":"352B","file":"usr"}
{"size":"11B","file":"var"}
</details>
See the usage section below for more examples.
Introductory Blogposts
For a quick start, read the following blog posts:
- [Exploring Large Data Files with pxi][post-exploring-large-data-files-with-pxi]
- [How I Tamed a Pixie][post-how-tamed-pixie]
š§ Pixie
Pixie's philosophy is to provide a small, extensible frame for processing large files and streams with JavaScript functions. Different data formats are supported through plugins. JSON, CSV, SSV, and TSV are supported by default, but users can customize their pixie installation by picking and choosing from more available (including third-party) plugins.
Pixie works its magic by chunking, deserializing, applying functions, and serializing data. Expressed in code, it works like this:
function pxi (data) { // Data is passed to pxi from stdin.
const chunks = chunk(data) // The data is chunked.
const jsons = deserialize(chunks) // The chunks are deserialized into JSON objects.
const jsons2 = apply(f, jsons) // f is applied to each object and new JSON objects are returned.
const string = serialize(jsons2) // The new objects are serialized to a string.
process.stdout.write(string) // The string is written to stdout.
}
For example, chunking, deserializing, and serializing JSON is provided by the [pxi-json][pxi-json] plugin.
Plugins
The following plugins are available:
| | Chunkers | Deserializers | Appliers | Serializers | pxi |
|----------------------------|-----------|----------------------------|----------------------------|----------------------------|:-----:|
| [pxi-dust][pxi-dust] | line | | map, flatMap, filter | string | ā |
| [pxi-json][pxi-json] | jsonObj | json | | json | ā |
| [pxi-dsv][pxi-dsv] | | csv, tsv, ssv, dsv | | csv, tsv, ssv, dsv | ā |
| [pxi-sample][pxi-sample] | sample | sample | sample | sample | ā |
The last column states which plugins come preinstalled in pxi.
Refer to the .pxi Module section to see how to enable more plugins and how to develop plugins.
New experimental pixie plugins are developed i.a. in the [pxi-sandbox][pxi-sandbox] repository.
Performance
pxi is very fast and beats several similar tools in [performance benchmarks][pxi-benchmarks].
Times are given in CPU time (seconds), wall-clock times may deviate by ± 1s.
The benchmarks were run on a 13" MacBook Pro (2019) with a 2,8 GHz Quad-Core i7 and 16GB memory.
Feel free to run the [benchmarks][pxi-benchmarks] on your own machine
and if you do, please [open an issue][issues] to report your results!
| [Benchmark][pxi-benchmarks] | Description | pxi | gawk | jq | mlr | fx |
|-----------------------------|-----------------------------------------------|------:|-------:|-----:|------:|-----:|
| JSON 1 | Select an attribute on small JSON objects | 11s | 15s | 46s | ā | 284s |
| JSON 2 | Select an attribute on large JSON objects | 20s | 20s | 97s | ā | 301s |
| JSON 3 | Pick a single attribute on small JSON objects | 15s | 21s | 68s | 91s | 368s |
| JSON 4 | Pick a single attribute on large JSON objects | 26s | 27s | 130s | 257sā | 420s |
| JSON to CSV 1 | Convert a small JSON to CSV format | 15s | ā | 77s | 60s | ā |
| JSON to CSV 2 | Convert a large JSON to CSV format | 38s | ā | 264s | 237sā | ā |
| CSV 1 | Select a column from a small csv file | 11s | 8s | 37s | 23s | ā |
| CSV 2 | Select a column from a large csv file | 19s | 9s | 66s | 72s | ā |
| CSV to JSON 1 | Convert a small CSV to JSON format | 15s | ā | ā | 120s | ā |
| CSV to JSON 2 | Convert a large CSV to JSON format | 42s | ā | ā | 352s | ā |
ā mlr appears to load the w
Related Skills
node-connect
339.3kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
83.9kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
339.3kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
commit-push-pr
83.9kCommit, push, and open a PR
