SkillAgentSearch skills...

Zpaqlpy

Compiles a zpaqlpy source file (a Python-subset) to a ZPAQ configuration file for usage with zpaqd

Install / Use

/learn @pothos/Zpaqlpy
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

zpaqlpy compiler

Compiles a zpaqlpy source file (a Python-subset) to a ZPAQ configuration file for usage with zpaqd.

That way it is easy to develop new compression algorithms with ZPAQ.

Or to bring a decompression algorithm to the ZPAQ format so that the compressed data can be stored in a ZPAQ archive without breaking compatibility.

An example is the brotlizpaq wrapper around zpaqd which compresses the input files with brotli and stores them as valid blocks in a ZPAQ archive (which will decompress slower than native brotli decompression due to the less efficient ZPAQL implementation).

The Python source files are standalone executable with Python 3 (tested: 3.4, 3.5).

Jump to the end for a tutorial or look into test/lz1.py, test/pnm.py or test/brotli.py for an example.

Download from releases or install with

git clone https://github.com/pothos/zpaqlpy.git
cd zpaqlpy
cargo install  # will build and copy the binary to ~/.cargo/bin/

Build in place with: make zpaqlpy

To build again: make clean

B.Sc. Thesis

Copyright (C) 2016 Kai Lüke kailueke at@ riseup.net

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see http://www.gnu.org/licenses/.

The ZPAQ format and the zpaq archiver

The ZPAQ Open Standard Format for Highly Compressed Data

Based on the idea to deliver the decompression algorithm together with the compressed data this archive format wants to solve the problem that changes to the algorithm need new software at the recipient's device. Also it acknowledges the fact that different input data should be handled with different compression techniques.

The PAQ compression programmes typically use context mixing i.e. mixing different predictors which are context-aware for usage in an arithmetic encoder, and thus often achieve the best known compression results. The ZPAQ archiver is the successor to them and also supports more simple models like LZ77 and BWT depending on the input data.

It is only specified how decompression takes place. The format makes use of predefined context model components which can be woven into a network, a binary code for context computation for components and a postprocessor which reverts a transformation on the input data that took place before the data was passed to the context mixing and encoding phase. The postprocessor is also delivered as a bytecode like the context computation code before the compressed data begins.

Specification: http://mattmahoney.net/dc/zpaq206.pdf

zpaq - Incremental Journaling Backup Utility and Archiver

The end user archiver supports incremental backups with deduplication as well as flat streaming archives (ZPAQ format Level 1). It picks simple or more complex depending on whether they perform for the input data and which compression level was specified for the files to append to the archive. Arbitrary algorithms are not supported, but a good variety of specialised and universal methods is available.

Homepage: http://mattmahoney.net/dc/zpaq.html

Working principle: http://mattmahoney.net/dc/zpaq_compression.pdf

zpaqd - development tool for new algorithms

The zpaqd development tool only allows creation of streaming mode archives, but therefore accepts a ZPAQ configuration file containing information on the used context mixing components, the ZPAQL programme for context computation and the ZPAQL postprocessing programme in order to revert a possible transformation that took place (LZ77, BWT, E8E9 for x86 files or any custom transformation), which is applied before compression an externally called programme named in the configuration. There are special configurations for JPG, BMP and more.

Homepage: http://mattmahoney.net/dc/zpaqutil.html

The zpaqlpy Python-subset

Grammar

For user-defined sections of the template. Not all is supported but anyway included for specific error messages instead of parser errors (e.g. nonlocal, dicts, strings or the @-operator for matrix multiplication).

Listed here are productions with NUMBER, NAME, ”symbols”, NEWLINE, INDENT, DEDENT or STRING as terminals, nonterminals are defined on the left side of the -> arrow.

Prog -> (NEWLINE* stmt)* ENDMARKER?
funcdef -> ”def” NAME Parameters ”:” suite
Parameters -> ”(” Typedargslist? ”)”
Typedargslist -> Tfpdef (”=” test)? (”,” Tfpdef (”=” test)?)* (”,” (”**” Tfpdef)?)?
Tfpdef -> NAME (”:” test)?
stmt -> simple_stmt | compound_stmt
simple_stmt -> small_stmt (”;” small_stmt)* ”;”? NEWLINE
small_stmt -> expr_stmt, pass_stmt, flow_stmt, global_stmt, nonlocal_stmt
expr_stmt -> (store_assign augassign test) | ((store_assign ”=”)? test)
store_assign -> NAME (”[” test ”]”)?
augassign -> ”+=” | ”-=” | ”*=” | ”@=” | ”//=” | ”/=” | ”%=” | ”&=” | ”|=” | ”^=” | ”<<=” | ”>>=” | ”**=”
pass_stmt -> ”pass”
flow_stmt -> break_stmt | continue_stmt | return_stmt
break_stmt -> ”break”
continue_stmt -> ”continue”
return_stmt -> ”return” test
global_stmt -> ”global” NAME (”,” NAME)*
nonlocal_stmt -> ”nonlocal” NAME (”,” NAME)*
compound_stmt -> if_stmt | while_stmt | funcdef
if_stmt -> ”if” test ”:” suite (”elif” test ”:” suite)* (”else” ”:” suite)?
while_stmt -> ”while” test ”:” suite (”else” ”:” suite)?
suite -> simple_stmt, NEWLINE INDENT stmt+ DEDENT
test -> or_test
test_nocond -> or_test
or_test -> and_test (”or” and_test)*
and_test -> not_test (”and” not_test)*
not_test -> comparison | (”not” not_test)
comparison -> expr (comp_op expr)*
comp_op -> ”<” | ”>” | ”==” | ”>=” | ”<=” | ”!=” | ”in” | ”not” ”in” | ”is” | ”is” ”not”
expr -> xor_expr (”|” xor_expr)*
xor_expr -> and_expr (”^” and_expr)*
and_expr -> shift_expr (”&” shift_expr)*
shift_expr -> arith_expr | (arith_expr (shift_op arith_expr)+)
shift_op -> ”<<” | ”>>”
arith_expr -> term | (term (t_op term)+)
t_op -> ”+” | ”-”
term -> factor (f_op factor)*
f_op -> ”*” | ”@” | ”/” | ”%” | ”//”
factor -> (”+” factor) | (”-” factor) | (”~” factor) | power
power -> atom_expr (”**” factor)?
atom_expr -> (NAME ”(” arglist? ”)”) | (NAME ”[” test ”]”) | atom
atom -> (”(” test ”)”) | (”” dictorsetmaker? ””) | NUMBER | STRING+ | ”...”
        ”None” | ”True” | ”False” | NAME
dictorsetmaker -> dictorsetmaker_t (”,” dictorsetmaker_t)* ”,”?
dictorsetmaker_t -> test ”:” test
arglist -> test (”,” test)* ”,”?

Notes

An input has to be organised like the template, so best is to fill it out with the values for hh, hm, ph, pm like in a ZPAQ configuration to define the size of H and M in hcomp and pcomp sections. In the dict which serves for calculation of n (i.e. number of context mixing components) you have to specify the components as in a ZPAQ configuration file, arguments are documented in the specification (see --info-zpaq for link).

Only valid Python programmes without exceptions are supported as input, so run them standalone before compiling. For the arrays on top of H or M there is no boundary check, please make sure the Python version works correct. If you need a ringbuffer on H or M, you have to use % len(hH) or &((1<<hh)-1) and can not rely on integer overflows or the modulo-array-length operation on indices in H or M like in plain ZPAQL because H is expanded to contain the stack (and also due to the lack of overflows when running the plain Python script)

Only positive 32-bit integers can be used, no strings, lists, arbitrary big numbers, classes, closures and (function) objects.

Input File

Must be a runnable Python 3.5 file in form of the template and encoded as UTF-8 without a BOM (Byte-Order-Mark). The definitions at the beginning should be altered and own code inserted only behind. The other two editable sections can refer to definitions in the first section.

        Template Sections (--emit-template > source.py)         |   Editable?
----------------------------------------------------------------|--------------
  Definition of the ZPAQ configuration header data (memory size, context mixing components) and optionally functions and variables used by both hcomp and pcomp                        |      yes
  API functions for input and output, initialization of memory  |       no
  function hcomp and associated global variables and functions  |      yes
  function pcomp and associated global variables and functions  |      yes
  code for standalone execution of the Python file analog to running a ZPAQL configuration with zpaqd `r [cfg] p|h`          |       no

Exposed API

The 32- or 8-bit memory areas H and M are available as arrays hH, pH, hM, pM depending on being a hcomp or pcomp section with size 2**hh , 2**hm , 2**ph, 2**pm defined in the header as available constants hh, hm, ph, pm. There is support for len(hH), len(pH), len(hM), len(pM) instead of calculating 2**hh. But in general len() is not supported, see len_hH() below for dynamic arrays. NONE is a shortcut for 0 - 1 = 4294967295.

      Other functions       |                   Descrip

Related Skills

View on GitHub
GitHub Stars22
CategoryDevelopment
Updated10mo ago
Forks4

Languages

Rust

Security Score

87/100

Audited on May 3, 2025

No findings