Pypipe
Python pipe command line tool
Install / Use
/learn @bugen/PypipeREADME
pypipe <!-- omit from toc -->
$ echo "pypipe" | ppp "line[::2]"
ppp
pypipe is a Python command-line tool for pipeline processing.
Demo <!-- omit from toc -->
Quick links <!-- omit from toc -->
- Installation
- Basic usage and Examples
- Automatic Import and Explicit Import
- Automatic type conversion
-t, --convert - View mode
-v, --view - Output formatting
- Counter
-c, --counter - pypipe is a code generator.
- Pager
Installation
pypipe is a single Python file and uses only the standard library. You can use it by placing pypipe.py in a directory included in your PATH (e.g., ~/.local/bin). If execute permission is not already present, please add it.
chmod +x pypipe.py
To make it easier to type, it's recommended to create a symbolic link.
ln -s pypipe.py ppp
[!Note] pypipe requires Python 3.6 or later.
pypipe can also be installed in the standard way for Python packages, using pip or any compatible tool such as pipx.
pipx install pypipe-ppp
It also supports running directly with pipx without installation.
pipx run pypipe-ppp <args>
You can also use it with Wasmer:
alias ppp="wasmer run bugen/pypipe -- "
Basic usage and Examples
| ppp line
Processing line-by-line. You can access the current line as line or l, and the current line number as i.
$ cat staff.txt |ppp 'i, line.upper()'
1 NAME WEIGHT BIRTH AGE SPECIES CLASS
2 SIMBA 250 1994-06-15 29 LION MAMMAL
3 DUMBO 4000 1941-10-23 81 ELEPHANT MAMMAL
4 GEORGE 20 1939-01-01 84 MONKEY MAMMAL
5 POOH 1 1921-08-21 102 TEDDY BEAR ARTIFACT
6 BOB 0 1999-05-01 24 SPONGE DEMOSPONGE
Using the -j, --json option allows you to decode each line as JSON. The decoded result can be obtained as dic.
$ cat staff.jsonlines.txt |ppp -j 'dic["Name"]'
Simba
Dumbo
George
Pooh
Bob
| ppp rec
Split each line by TAB. You can get the list including splitted strings as rec or r and the record number as i..
cat staff.txt |ppp rec 'r[:3]'
Name Weight Birth
Simba 250 1994-06-15
Dumbo 4000 1941-10-23
George 20 1939-01-01
Pooh 1 1921-08-21
Bob 0 1999-05-01
Using the -l LENGTH, --length LENGTH option allows you to get the values of each field as f1, f2, f3, ....
$ tail -n +2 staff.txt |ppp rec -l5 'f"{f1} is {f4} years old"'
Simba is 29 years old
Dumbo is 81 years old
George is 84 years old
Pooh is 102 years old
Bob is 24 years old
[!Tip] You can now use field variables (f1, f2, f3, ...) without specifying the
--lengthoption.$ cat staff.txt | ppp rec f1,f2,f3Using field variables can make typing easier, but you have to know the number of fields in advance. Omitting the
--lengthoption makes it more convenient to use, but if you omit it, performance will be degraded. In tests, processing data with about 60,000 records and 23 items took 0.45 seconds when specifying the--lengthoption, whereas omitting the--lengthoption took about 0.75 seconds. To maintain performance, either use the--lengthoption or retrieve fields from rec using indices likerec[0], rec[1], rec[2], ...without using field variables.
When using the -H, --header option, it treats the first line as a header line and skips it. The header values can be obtained from a list named header, and you can access the values of each field using the format dic["FIELD_NAME"].
$ cat staff.txt |ppp rec -H 'rec[0], dic["Birth"]'
Simba 1994-06-15
Dumbo 1941-10-23
George 1939-01-01
Pooh 1921-08-21
Bob 1999-05-01
By using the --type FIELD_TYPES, --field-type FIELD_TYPES, you can specify the type of each field, allowing you to convert values from 'str' to the specified type.
$ echo 'Hello 100 10.2 True {"id":100,"title":"sample"}'|ppp rec -l5 --type 2:i,3:f,4:b,5:j "type(f1),type(f2),type(f3),type(f4),type(f5)"
<class 'str'> <class 'int'> <class 'float'> <class 'bool'> <class 'dict'>
[!Tip] When there is a header row in the data, using
--type, --field-typeoften results in errors when attempting to convert the header row's item names to the specified types. In such cases, you can avoid errors by using the-H, --headeroption to skip the header row.
[!Note] pypipe has added support for automatic type conversion.
You can change the delimiter by using the -d DELIMITER, --delimiter DELIMITER option.
$ cat staff.csv |ppp rec -d , -l6 f1
Name
Simba
Dumbo
George
Pooh
Bob
Also supports regular expression delimiters.
$ echo 'AAA BBB CCC DDD' | ppp rec -d '\s+' rec[2]
CCC
[!Tip]
-S, --spacesoption has the same meaning as-d '\s+'.
You can change the output delimiter by using the -D DELIMITER, --output-delimiter DELIMITER option.
$ cat staff.txt |ppp rec -D ,
Name,Weight,Birth,Age,Species,Class
Simba,250,1994-06-15,29,Lion,Mammal
Dumbo,4000,1941-10-23,81,Elephant,Mammal
George,20,1939-01-01,84,Monkey,Mammal
Pooh,1,1921-08-21,102,Teddy bear,Artifact
Bob,0,1999-05-01,24,Sponge,Demosponge
When using the -m, --regex-match option, rec is generated through regular expression matching instead of delimiter-based splitting.
$ echo 'Height: 200px, Width: 1000px' | ppp rec -m '\d+' r[1]
1000
| ppp csv
csv is similar to rec, but the difference is that while rec simply splits the line using the specified DELIMITER like this, 'line.split(DELIMITER))', csv uses the csv library for parsing. Furthermore, rec is tab-separated by default, whereas csv is comma-separated.
You can specify options to pass to csv.reader and csv.writer using the -O NAME=VALUE, --csv-opt NAME=VALUE option.
$ cat staff.csv |ppp csv -O 'quoting=csv.QUOTE_ALL'
"Name","Weight","Birth","Age","Species","Class"
"Simba","250","1994-06-15","29","Lion","Mammal"
"Dumbo","4000","1941-10-23","81","Elephant","Mammal"
"George","20","1939-01-01","84","Monkey","Mammal"
"Pooh","1","1921-08-21","102","Teddy bear","Artifact"
"Bob","0","1999-05-01","24","Sponge","Demosponge"
| ppp text
In ppp text, the entire standard input is read as a single piece of text. You can access the read text as text.
$ cat staff.txt | ppp text 'len(text)'
231
For example, ppp text is particularly useful when working with an indented JSON file. Using the -j, --json option allows you to decode the text into JSON. The decoded data can be obtained as a dic.
$ cat staff.json |ppp text -j 'dic["data"][0]'
{'Name': 'Simba', 'Weight': 250, 'Birth': '1994-06-15', 'Age': 29, 'Species': 'Lion', 'Class': 'Mammal'}
[!Tip] You can also use
-j, --jsonoption inlineandfile.
| ppp file
In ppp file, it receives a list of file paths from standard input. It then opens each received file path, reads the contents of the file into text, and repeats this process for each received file path in a loop. The received paths can be obtained as path.
$ ls staff.txt staff.csv staff.json staff.xml |ppp file 'path, len(text)'
staff.csv 231
staff.json 1046
staff.txt 231
staff.xml 1042
For example, ppp file is useful, especially when processing a large number of JSON files.
find . -name '*.json'| ppp file --json ...
| ppp custom -N NAME
You can easily create custom commands using pypipe. First, you define custom commands. The definition file is, by default, located at ~/.config/pypipe/pypipe_custom.py. You can change the path of this file using the PYPIPE_CUSTOM environment variable.
The following is an example of defining custom commands xpath and sum.
~/.config/pypipe/pypipe_custom.py
TEMPLATE_XPATH = r"""
from lxml import etree
{imp}
def output(e):
if isinstance(e, etree._Element):
print(etree.tostring(e).decode().rstrip())
else:
_print(e)
{pre}
tree = etree.parse(sys.stdin)
for e in tree.xpath('{path}'):
{loop_head}
{loop_filter}
{main}
{post}
"""
TEMPLATE_SUM = r"""
import re
import sys
{imp}
ptn = re.compile(r'{pattern}')
s = 0
def add_or_print(*args):
global s
rec = args[0]
if len(args) == 2:
if isinstance(args[1], int):
i = args[1]
if len(rec) >= i:
s += rec[i-1]
else:
print(args[1])
else:
print(*args[1:])
for line in sys.stdin:
line = line.rstrip('\r\n')
rec = [{type}(e) for e in ptn.findall(line)]
if not rec:
continue
{loop_head}
{loop_filter}
{main}
print(s)
"""
custom_command = {
"xpath": {
"template": TEMPLATE_XPATH,
"code_indent": 1,
"default_code": "e",
"wrapper": 'output({})',
"options": {
"path": {"default": '/'}
}
},
"sum": {
"template": TEMPLATE_SUM,
"code_indent": 1,
"default_code": "1",
"wrapper": 'add_or_print(rec, {})',
"options": {
"pattern": {"default": r'\d+'},
"type": {"default": 'int'}
}
},
}
You can use them as follows:
$ cat staff.xml |ppp custom -N xpath -O path='./Animal/Age'
<Age>29</Age>
<Age>81</Age>
<Age>84</Age>
<Age>102</Age>
<Age>24</Age>
$ seq 10000| ppp c -Nsum -f 'rec[0] % 3 == 0'
16668333
Automatic Import and Explicit Import
pypipe attempts to automatically import the necessary modules. While
