Zpaqlpy
Compiles a zpaqlpy source file (a Python-subset) to a ZPAQ configuration file for usage with zpaqd
Install / Use
/learn @pothos/ZpaqlpyREADME
zpaqlpy compiler
Compiles a zpaqlpy source file (a Python-subset) to a ZPAQ configuration file for usage with zpaqd.
That way it is easy to develop new compression algorithms with ZPAQ.
Or to bring a decompression algorithm to the ZPAQ format so that the compressed data can be stored in a ZPAQ archive without breaking compatibility.
An example is the brotlizpaq wrapper around zpaqd which compresses the input files with brotli and stores them as valid blocks in a ZPAQ archive (which will decompress slower than native brotli decompression due to the less efficient ZPAQL implementation).
The Python source files are standalone executable with Python 3 (tested: 3.4, 3.5).
Jump to the end for a tutorial or look into test/lz1.py, test/pnm.py or test/brotli.py for an example.
Download from releases or install with
git clone https://github.com/pothos/zpaqlpy.git
cd zpaqlpy
cargo install # will build and copy the binary to ~/.cargo/bin/
Build in place with: make zpaqlpy
To build again: make clean
Copyright (C) 2016 Kai Lüke kailueke at@ riseup.net
This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program. If not, see http://www.gnu.org/licenses/.
The ZPAQ format and the zpaq archiver
The ZPAQ Open Standard Format for Highly Compressed Data
Based on the idea to deliver the decompression algorithm together with the compressed data this archive format wants to solve the problem that changes to the algorithm need new software at the recipient's device. Also it acknowledges the fact that different input data should be handled with different compression techniques.
The PAQ compression programmes typically use context mixing i.e. mixing different predictors which are context-aware for usage in an arithmetic encoder, and thus often achieve the best known compression results. The ZPAQ archiver is the successor to them and also supports more simple models like LZ77 and BWT depending on the input data.
It is only specified how decompression takes place. The format makes use of predefined context model components which can be woven into a network, a binary code for context computation for components and a postprocessor which reverts a transformation on the input data that took place before the data was passed to the context mixing and encoding phase. The postprocessor is also delivered as a bytecode like the context computation code before the compressed data begins.
Specification: http://mattmahoney.net/dc/zpaq206.pdf
zpaq - Incremental Journaling Backup Utility and Archiver
The end user archiver supports incremental backups with deduplication as well as flat streaming archives (ZPAQ format Level 1). It picks simple or more complex depending on whether they perform for the input data and which compression level was specified for the files to append to the archive. Arbitrary algorithms are not supported, but a good variety of specialised and universal methods is available.
Homepage: http://mattmahoney.net/dc/zpaq.html
Working principle: http://mattmahoney.net/dc/zpaq_compression.pdf
zpaqd - development tool for new algorithms
The zpaqd development tool only allows creation of streaming mode archives, but therefore accepts a ZPAQ configuration file containing information on the used context mixing components, the ZPAQL programme for context computation and the ZPAQL postprocessing programme in order to revert a possible transformation that took place (LZ77, BWT, E8E9 for x86 files or any custom transformation), which is applied before compression an externally called programme named in the configuration. There are special configurations for JPG, BMP and more.
Homepage: http://mattmahoney.net/dc/zpaqutil.html
The zpaqlpy Python-subset
Grammar
For user-defined sections of the template. Not all is supported but anyway included for specific error messages instead of parser errors (e.g. nonlocal, dicts, strings or the @-operator for matrix multiplication).
Listed here are productions with NUMBER, NAME, ”symbols”, NEWLINE, INDENT, DEDENT or STRING as terminals, nonterminals are defined on the left side of the -> arrow.
Prog -> (NEWLINE* stmt)* ENDMARKER?
funcdef -> ”def” NAME Parameters ”:” suite
Parameters -> ”(” Typedargslist? ”)”
Typedargslist -> Tfpdef (”=” test)? (”,” Tfpdef (”=” test)?)* (”,” (”**” Tfpdef)?)?
Tfpdef -> NAME (”:” test)?
stmt -> simple_stmt | compound_stmt
simple_stmt -> small_stmt (”;” small_stmt)* ”;”? NEWLINE
small_stmt -> expr_stmt, pass_stmt, flow_stmt, global_stmt, nonlocal_stmt
expr_stmt -> (store_assign augassign test) | ((store_assign ”=”)? test)
store_assign -> NAME (”[” test ”]”)?
augassign -> ”+=” | ”-=” | ”*=” | ”@=” | ”//=” | ”/=” | ”%=” | ”&=” | ”|=” | ”^=” | ”<<=” | ”>>=” | ”**=”
pass_stmt -> ”pass”
flow_stmt -> break_stmt | continue_stmt | return_stmt
break_stmt -> ”break”
continue_stmt -> ”continue”
return_stmt -> ”return” test
global_stmt -> ”global” NAME (”,” NAME)*
nonlocal_stmt -> ”nonlocal” NAME (”,” NAME)*
compound_stmt -> if_stmt | while_stmt | funcdef
if_stmt -> ”if” test ”:” suite (”elif” test ”:” suite)* (”else” ”:” suite)?
while_stmt -> ”while” test ”:” suite (”else” ”:” suite)?
suite -> simple_stmt, NEWLINE INDENT stmt+ DEDENT
test -> or_test
test_nocond -> or_test
or_test -> and_test (”or” and_test)*
and_test -> not_test (”and” not_test)*
not_test -> comparison | (”not” not_test)
comparison -> expr (comp_op expr)*
comp_op -> ”<” | ”>” | ”==” | ”>=” | ”<=” | ”!=” | ”in” | ”not” ”in” | ”is” | ”is” ”not”
expr -> xor_expr (”|” xor_expr)*
xor_expr -> and_expr (”^” and_expr)*
and_expr -> shift_expr (”&” shift_expr)*
shift_expr -> arith_expr | (arith_expr (shift_op arith_expr)+)
shift_op -> ”<<” | ”>>”
arith_expr -> term | (term (t_op term)+)
t_op -> ”+” | ”-”
term -> factor (f_op factor)*
f_op -> ”*” | ”@” | ”/” | ”%” | ”//”
factor -> (”+” factor) | (”-” factor) | (”~” factor) | power
power -> atom_expr (”**” factor)?
atom_expr -> (NAME ”(” arglist? ”)”) | (NAME ”[” test ”]”) | atom
atom -> (”(” test ”)”) | (”” dictorsetmaker? ””) | NUMBER | STRING+ | ”...”
”None” | ”True” | ”False” | NAME
dictorsetmaker -> dictorsetmaker_t (”,” dictorsetmaker_t)* ”,”?
dictorsetmaker_t -> test ”:” test
arglist -> test (”,” test)* ”,”?
Notes
An input has to be organised like the template, so best is to fill it out with
the values for hh, hm, ph, pm like in a ZPAQ configuration to define the size of
H and M in hcomp and pcomp sections. In the dict which serves for calculation of
n (i.e. number of context mixing components) you have to specify the components
as in a ZPAQ configuration file, arguments are documented in the specification
(see --info-zpaq for link).
Only valid Python programmes without exceptions are supported as input, so run
them standalone before compiling.
For the arrays on top of H or M there is no boundary check, please make sure
the Python version works correct. If you need a ringbuffer on H or M, you have
to use % len(hH) or &((1<<hh)-1) and can not rely on integer overflows or the
modulo-array-length operation on indices in H or M like in plain ZPAQL because
H is expanded to contain the stack (and also due to the lack of overflows when
running the plain Python script)
Only positive 32-bit integers can be used, no strings, lists, arbitrary big numbers, classes, closures and (function) objects.
Input File
Must be a runnable Python 3.5 file in form of the template and encoded as UTF-8 without a BOM (Byte-Order-Mark). The definitions at the beginning should be altered and own code inserted only behind. The other two editable sections can refer to definitions in the first section.
Template Sections (--emit-template > source.py) | Editable?
----------------------------------------------------------------|--------------
Definition of the ZPAQ configuration header data (memory size, context mixing components) and optionally functions and variables used by both hcomp and pcomp | yes
API functions for input and output, initialization of memory | no
function hcomp and associated global variables and functions | yes
function pcomp and associated global variables and functions | yes
code for standalone execution of the Python file analog to running a ZPAQL configuration with zpaqd `r [cfg] p|h` | no
Exposed API
The 32- or 8-bit memory areas H and M are available as arrays hH, pH, hM, pM
depending on being a hcomp or pcomp section with size 2**hh , 2**hm , 2**ph,
2**pm defined in the header as available constants hh, hm, ph, pm.
There is support for len(hH), len(pH), len(hM), len(pM) instead of calculating
2**hh. But in general len() is not supported, see len_hH() below for dynamic
arrays. NONE is a shortcut for 0 - 1 = 4294967295.
Other functions | Descrip
Related Skills
node-connect
338.0kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
83.4kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
338.0kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
commit-push-pr
83.4kCommit, push, and open a PR
