Compressjs

Pure JavaScript de/compression (bzip2, etc) for node.js, volo, and the browser.

Generate Convert Improve

Install / Use

/learn @cscott/Compressjs

About this skill

Quality Score

0/100

README

compressjs

[![NPM][NPM1]][NPM2]

[![Build Status][1]][2] [![dependency status][3]][4] [![dev dependency status][5]][6]

compressjs contains fast pure-JavaScript implementations of various de/compression algorithms, including bzip2, Charles Bloom's LZP3, a modified LZJB, PPM-D, and an implementation of Dynamic Markov Compression. compressjs is written by C. Scott Ananian. The Range Coder used is a JavaScript port of Michael Schindler's C range coder. Bits also also borrowed from Yuta Mori's SAIS implementation; Eli Skeggs, Kevin Kwok, Rob Landley, James Taylor, and Matthew Francis for Bzip2 compression and decompression code. "Bear" wrote the original JavaScript LZJB; the version here is based on the node lzjb module.

Compression benchmarks

Here are some representative speeds and sizes for the various algorithms implemented in this package. Times are with node 0.8.22 on my laptop, but they should be valid for inter-algorithm comparisons.

test/sample5.ref

This is the Taoism article from the Simple English wikipedia, in HTML format as generated by the Wikipedia Parsoid project.

|Type|Level|Size (bytes)|Compress time (s)|Decompress time (s)| |----|:---:|-----------:|----------------:|------------------:| |bwtc |9| 272997|13.10| 1.85| |bzip2 |9| 275087|22.57| 1.21| |lzp3 |-| 292978| 1.73| 1.74| |ppm |-| 297220|42.05|44.04| |bzip2 |1| 341615|22.63| 1.40| |bwtc |1| 345764|12.34| 0.80| |dmc |-| 434182| 6.97| 9.00| |lzjbr |9| 491476| 3.19| 1.92| |lzjbr |1| 523780| 2.76| 2.02| |lzjb |9| 706210| 1.02| 0.30| |lzjb |1| 758467| 0.66| 0.29| |context1|-| 939098| 5.20| 4.69| |fenwick |-|1440645| 3.06| 3.72| |mtf |-|1441763| 1.92| 3.86| |huffman |-|1452055| 7.15| 6.56| |simple |-|1479143| 0.72| 2.42| |defsum |-|1491107| 3.19| 1.46| |no |-|2130648| 0.80| 0.92| |- |-|2130640|- |- |

enwik8

This test data is the first 10<sup>8</sup> bytes of the English Wikipedia XML dump on March 3, 2006. This is the data set used for the Large Text Compression Benchmark. It can be downloaded from that site.

|Type|Level|Size (bytes)|Compress time (s)|Decompress time (s)| |----|:---:|-----------:|----------------:|------------------:| |ppm |-| 26560169|2615.82|2279.17| |bzip2 |9| 28995650|1068.51| 66.95| |bwtc |9| 29403626| 618.63| 112.00| |bzip2 |1| 33525893|1035.29| 66.98| |lzp3 |-| 34305420| 123.69| 167.77| |bwtc |1| 34533422| 618.61| 43.52| |lzjbr |9| 43594841| 242.60| 141.51| |lzjbr |1| 44879071| 207.38| 147.14| |context1|-| 48480225| 253.48| 223.30| |huffman |-| 62702157| 301.50| 267.31| |fenwick |-| 62024449| 143.49| 164.15| |mtf |-| 62090746| 83.62| 168.03| |simple |-| 63463479| 27.79| 92.84| |defsum |-| 64197615| 75.48| 32.05| |lzjb |9| 64992459| 63.75| 5.90| |lzjb |1| 67828511| 29.26| 5.89| |no |-| 100000008| 26.29| 31.98| |- |-| 100000000| -| -|

Algorithm descriptions

compressjs.Bzip2 (-t bzip2) is the bzip2 algorithm we all have come to know and love. It has a block size between 100k and 900k.
compressjs.BWTC (-t bwtc) is substantially the same, but with a few simplifications/improvements which make it faster, smaller, and not binary-compatible. In particular, the unnecessary initial RLE step of bzip2 is omitted, and we use a range coder with an adaptive context-0 model after the MTF/RLE2 step, instead of the static huffman codes of bzip2.
compressjs.PPM (-t ppm) is a naive/simple implementation of the PPMD algorithm with a 256k sliding window.
compressjs.Lzp3 (-t lzp3) is an algorithm similar to Charles Bloom's LZP3 algorithm. It uses a 1M sliding window, a context-4 model, and a range coder.
compressjs.Dmc (-t dmc) is a partial implementation of Dynamic Markov Compression. Unlike most DMC implementations, our implementation is bytewise (not bitwise). There is currently no provision for shrinking the Markov model (or throwing it out when it grows too large), so be careful with large inputs! I may return to twiddle with this some more; see the source for details.
compressjs.Lzjb (-t lzjb) is a straight copy of the fast LZJB algorithm from https://github.com/cscott/lzjb.
compressjs.LzjbR (-t lzjbr) is a hacked version of LZJB which uses a range coder and a bit of modeling instead of the fixed 9-bit literal / 17-bit match format of the original.

The remaining algorithms are self-tests for various bits of compression code, not real compressors. Context1Model is a simple adaptive context-1 model using a range coder. Huffman is an adaptive Huffman coder using Vitter's algorithm. MTFModel, FenwickModel, and DefSumModel are simple adaptive context-0 models with escapes, implementing using a move-to-front list, a Fenwick tree, and Charles Bloom's deferred summation algorithm, respectively. Simple is a static context-0 model for the range coder. NoModel encodes the input bits directly; it shows the basic I/O overhead, as well as the few bytes of overhead due to the file magic and a variable-length encoding of the uncompressed size of the file.

How to install

npm install compressjs

volo add cscott/compressjs

This package uses Typed Arrays if available, which are present in node.js >= 0.5.5 and many modern browsers. Full browser compatibility table is available at caniuse.com; briefly: IE 10, Firefox 4, Chrome 7, or Safari 5.1.

Testing

npm install
npm test

Usage

There is a binary available in bin:

$ bin/compressjs --help
$ echo "Test me" | bin/compressjs -t lzp3 -z > test.lzp3
$ bin/compressjs -t lzp3 -d test.lzp3
Test me

The -t argument can take a number of different strings to specify the various compression algorithms available. Use --help to see the various options.

From JavaScript:

var compressjs = require('compressjs');
var algorithm = compressjs.Lzp3;
var data = new Buffer('Example data', 'utf8');
var compressed = algorithm.compressFile(data);
var decompressed = algorithm.decompressFile(compressed);
// convert from array back to string
var data2 = new Buffer(decompressed).toString('utf8');
console.log(data2);

There is a streaming interface as well. Use Uint8Array or normal JavaScript arrays when running in a browser.

See the tests in the tests/ directory for further usage examples.

Documentation

require('compressjs') returns a compressjs object. Its fields correspond to the various algorithms implemented, which export one of two different interfaces, depending on whether it is a "compression method" or a "model/coder".

Compression Methods

Compression methods (like compressjs.Lzp3) export two methods. The first is a function accepting one, two or three parameters:

cmp.compressFile = function(input, [output], [Number compressionLevel] or [props])

The input argument can be a "stream" object (which must implement the readByte method), or a Uint8Array, Buffer, or array.

If you omit the second argument, compressFile will return a JavaScript array containing the byte values of the compressed data. If you pass a second argument, it must be a "stream" object (which must implement the writeByte method).

The third argument may be omitted, or a number between 1 and 9 indicating a compression level (1 being largest/fastest compression and 9 being smallest/slowest compression). Some algorithms also permit passing an object for finer-grained control of various compression properties.

The second exported method is a function accepting one or two parameters:

cmp.decompressFile = function(input, [output])

The input parameter is as above.

If you omit the second argument, decompressFile will return a Uint8Array, Buffer or JavaScript array with the decompressed data, depending on what your platform supports. For most modern platforms (modern browsers, recent node.js releases) the returned value will be a Uint8Array.

If you provide the second argument, it must be a "stream", implementing the writeByte method.

Models and coders

The second type of object implemented is a model/coder. Huffman and RangeCoder share the same interface as the simple context-0 probability models MTFModel, FenwickModel, LogDistanceModel, and DeflateDistanceModel.

model.factory = function(parameters)

This method returns a function which can be invoked with a size argument to create a new instance of this model with the given parameters (which usually include the input/output stream or coder).

model.encode = function(symbol, [optional context])

Thi

Related Skills

node-connect

344.4k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

99.2k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

344.4k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

qqbot-media

344.4k

QQBot 富媒体收发能力。使用 <qqmedia> 标签，系统根据文件扩展名自动识别类型（图片/语音/视频/文件）。