pypipe

$ echo "pypipe" | ppp "line[::2]"
ppp

pypipe is a Python command-line tool for pipeline processing.

Demo

Demo

Installation

pypipe is a single Python file and uses only the standard library. You can use it by placing pypipe.py in a directory included in your PATH (e.g., ~/.local/bin). If execute permission is not already present, please add it.

chmod +x pypipe.py

To make it easier to type, it's recommended to create a symbolic link.

ln -s pypipe.py ppp

[!Note] pypipe requires Python 3.6 or later.

pypipe can also be installed in the standard way for Python packages, using pip or any compatible tool such as pipx.

pipx install pypipe-ppp

It also supports running directly with pipx without installation.

pipx run pypipe-ppp <args>

You can also use it with Wasmer:

alias ppp="wasmer run bugen/pypipe -- "

Basic usage and Examples

`| ppp line`

Processing line-by-line. You can access the current line as line or l, and the current line number as i.

$ cat staff.txt |ppp 'i, line.upper()'
1       NAME    WEIGHT  BIRTH   AGE     SPECIES CLASS
2       SIMBA   250     1994-06-15      29      LION    MAMMAL
3       DUMBO   4000    1941-10-23      81      ELEPHANT        MAMMAL
4       GEORGE  20      1939-01-01      84      MONKEY  MAMMAL
5       POOH    1       1921-08-21      102     TEDDY BEAR      ARTIFACT
6       BOB     0       1999-05-01      24      SPONGE  DEMOSPONGE

Using the -j, --json option allows you to decode each line as JSON. The decoded result can be obtained as dic.

$ cat staff.jsonlines.txt |ppp -j 'dic["Name"]'
Simba
Dumbo
George
Pooh
Bob

`| ppp rec`

Split each line by TAB. You can get the list including splitted strings as rec or r and the record number as i..

cat staff.txt |ppp rec 'r[:3]'
Name    Weight  Birth
Simba   250     1994-06-15
Dumbo   4000    1941-10-23
George  20      1939-01-01
Pooh    1       1921-08-21
Bob     0       1999-05-01

Using the -l LENGTH, --length LENGTH option allows you to get the values of each field as f1, f2, f3, ....

$ tail -n +2 staff.txt |ppp rec -l5 'f"{f1} is {f4} years old"'
Simba is 29 years old
Dumbo is 81 years old
George is 84 years old
Pooh is 102 years old
Bob is 24 years old

[!Tip] You can now use field variables (f1, f2, f3, ...) without specifying the --length option.
$ cat staff.txt | ppp rec f1,f2,f3
Using field variables can make typing easier, but you have to know the number of fields in advance. Omitting the --length option makes it more convenient to use, but if you omit it, performance will be degraded. In tests, processing data with about 60,000 records and 23 items took 0.45 seconds when specifying the --length option, whereas omitting the --length option took about 0.75 seconds. To maintain performance, either use the --length option or retrieve fields from rec using indices like rec[0], rec[1], rec[2], ... without using field variables.

When using the -H, --header option, it treats the first line as a header line and skips it. The header values can be obtained from a list named header, and you can access the values of each field using the format dic["FIELD_NAME"].

$ cat staff.txt |ppp rec -H 'rec[0], dic["Birth"]'
Simba   1994-06-15
Dumbo   1941-10-23
George  1939-01-01
Pooh    1921-08-21
Bob     1999-05-01

By using the --type FIELD_TYPES, --field-type FIELD_TYPES, you can specify the type of each field, allowing you to convert values from 'str' to the specified type.

$ echo 'Hello	100	10.2	True	{"id":100,"title":"sample"}'|ppp rec -l5 --type 2:i,3:f,4:b,5:j "type(f1),type(f2),type(f3),type(f4),type(f5)"
<class 'str'>   <class 'int'>   <class 'float'> <class 'bool'>  <class 'dict'>

[!Tip] When there is a header row in the data, using --type, --field-type often results in errors when attempting to convert the header row's item names to the specified types. In such cases, you can avoid errors by using the -H, --header option to skip the header row.

[!Note] pypipe has added support for automatic type conversion.

You can change the delimiter by using the -d DELIMITER, --delimiter DELIMITER option.

$ cat staff.csv |ppp rec -d , -l6  f1
Name
Simba
Dumbo
George
Pooh
Bob

Also supports regular expression delimiters.

$ echo 'AAA      BBB CCC    DDD' | ppp rec -d '\s+' rec[2]
CCC

[!Tip] -S, --spaces option has the same meaning as -d '\s+'.

You can change the output delimiter by using the -D DELIMITER, --output-delimiter DELIMITER option.

$ cat staff.txt |ppp rec -D ,
Name,Weight,Birth,Age,Species,Class
Simba,250,1994-06-15,29,Lion,Mammal
Dumbo,4000,1941-10-23,81,Elephant,Mammal
George,20,1939-01-01,84,Monkey,Mammal
Pooh,1,1921-08-21,102,Teddy bear,Artifact
Bob,0,1999-05-01,24,Sponge,Demosponge

When using the -m, --regex-match option, rec is generated through regular expression matching instead of delimiter-based splitting.

$ echo 'Height: 200px, Width: 1000px' | ppp rec -m '\d+' r[1]
1000

`| ppp csv`

csv is similar to rec, but the difference is that while rec simply splits the line using the specified DELIMITER like this, 'line.split(DELIMITER))', csv uses the csv library for parsing. Furthermore, rec is tab-separated by default, whereas csv is comma-separated.

You can specify options to pass to csv.reader and csv.writer using the -O NAME=VALUE, --csv-opt NAME=VALUE option.

$ cat staff.csv |ppp csv -O 'quoting=csv.QUOTE_ALL'
"Name","Weight","Birth","Age","Species","Class"
"Simba","250","1994-06-15","29","Lion","Mammal"
"Dumbo","4000","1941-10-23","81","Elephant","Mammal"
"George","20","1939-01-01","84","Monkey","Mammal"
"Pooh","1","1921-08-21","102","Teddy bear","Artifact"
"Bob","0","1999-05-01","24","Sponge","Demosponge"

`| ppp text`

In ppp text, the entire standard input is read as a single piece of text. You can access the read text as text.

$ cat staff.txt | ppp text 'len(text)'
231

For example, ppp text is particularly useful when working with an indented JSON file. Using the -j, --json option allows you to decode the text into JSON. The decoded data can be obtained as a dic.

$ cat staff.json |ppp text -j 'dic["data"][0]'
{'Name': 'Simba', 'Weight': 250, 'Birth': '1994-06-15', 'Age': 29, 'Species': 'Lion', 'Class': 'Mammal'}

[!Tip] You can also use -j, --json option in line and file.

`| ppp file`

In ppp file, it receives a list of file paths from standard input. It then opens each received file path, reads the contents of the file into text, and repeats this process for each received file path in a loop. The received paths can be obtained as path.

$ ls staff.txt staff.csv staff.json staff.xml |ppp file 'path, len(text)'
staff.csv       231
staff.json      1046
staff.txt       231
staff.xml       1042

For example, ppp file is useful, especially when processing a large number of JSON files.

find . -name '*.json'| ppp file --json ...

`| ppp custom -N NAME`

You can easily create custom commands using pypipe. First, you define custom commands. The definition file is, by default, located at ~/.config/pypipe/pypipe_custom.py. You can change the path of this file using the PYPIPE_CUSTOM environment variable.

The following is an example of defining custom commands xpath and sum.

~/.config/pypipe/pypipe_custom.py

TEMPLATE_XPATH = r"""
from lxml import etree
{imp}

def output(e):
    if isinstance(e, etree._Element):
        print(etree.tostring(e).decode().rstrip())
    else:
        _print(e)

{pre}

tree = etree.parse(sys.stdin)
for e in tree.xpath('{path}'):
{loop_head}
{loop_filter}
{main}

{post}
"""

TEMPLATE_SUM = r"""
import re
import sys
{imp}

ptn = re.compile(r'{pattern}')
s = 0

def add_or_print(*args):
    global s
    rec = args[0]
    if len(args) == 2:
        if isinstance(args[1], int):
            i = args[1]
            if len(rec) >= i:
                s += rec[i-1]
        else:
            print(args[1])
    else:
        print(*args[1:])


for line in sys.stdin:
    line = line.rstrip('\r\n')
    rec = [{type}(e) for e in ptn.findall(line)]
    if not rec:
        continue
{loop_head}
{loop_filter}
{main}

print(s)
"""

custom_command = {
    "xpath": {
        "template": TEMPLATE_XPATH,
        "code_indent": 1,
        "default_code": "e",
        "wrapper": 'output({})',
        "options": {
            "path": {"default": '/'}
        }
    },
    "sum": {
        "template": TEMPLATE_SUM,
        "code_indent": 1,
        "default_code": "1",
        "wrapper": 'add_or_print(rec, {})',
        "options": {
            "pattern": {"default": r'\d+'},
            "type": {"default": 'int'}
        }
    },
}

You can use them as follows:

$ cat staff.xml |ppp custom -N xpath -O path='./Animal/Age'
<Age>29</Age>
<Age>81</Age>
<Age>84</Age>
<Age>102</Age>
<Age>24</Age>

$ seq 10000| ppp c -Nsum -f 'rec[0] % 3 == 0'
16668333

Automatic Import and Explicit Import

pypipe attempts to automatically import the necessary modules. While

Pypipe

Install / Use

README