lrzip - Long Range ZIP or LZMA RZIP

A compression utility that excels at compressing large files (usually > 10-50 MB). Larger files and/or more free RAM means that the utility will be able to more effectively compress your files (ie: faster / smaller size), especially if the filesize(s) exceed 100 MB. You can either choose to optimise for speed (fast compression / decompression) or size, but not both.

haneefmubarak's TL;DR for the long explanation:

Just change the word directory to the name of the directory you wish to compress.

Compression:

lrzdir=directory; tar cvf $lrzdir.tar $lrzdir; lrzip -Ubvvp `nproc` -S .bzip2-lrz -L 9 $lrzdir.tar; rm -fv $lrzdir.tar; unset lrzdir

tars the directory, then maxes out all of the system's processor cores along with sliding window RAM to give the best BZIP2 compression while being as fast as possible, enables max verbosity output, attaches the extension .bzip2-lrz, and finally gets rid of the temporary tarfile. Uses a tempvar lrzdir which is unset automatically.

Decompression for the kind of file from above:

lrzdir=directory; lrunzip -cdivvp `nproc` -o $lrzdir.tar $lrzdir.tar.bzip2-lrz; tar xvf $lrzdir.tar; rm -vf $lrzdir.tar

Checks integrity, then decompresses the directory using all of the processor cores for max speed, enables max verbosity output, unarchives the resulting tarfile, and finally gets rid of the temporary tarfile. Uses the same kind of tempvar.

lrzip build/install guide:

A quick guide on building and installing.

What you will need

gcc
bash or zsh
pthreads
tar
libc
libm
libz-dev
libbz2-dev
liblzo2-dev
liblz4-dev
coreutils
Optional nasm
git if you want a repo-fresh copy
an OS with the usual *nix headers and libraries

Obtaining the source

Two different ways of doing this:

Stable: Packaged tarball that is known to work:

Go to https://github.com/ckolivas/lrzip/releases and download the tar.gz file from the top. cd to the directory you downloaded, and use tar xvzf lrzip-X.X.tar.gz to extract the files (don't forget to replace X.X with the correct version). Finally, cd into the directory you just extracted.

Latest: git clone -v https://github.com/ckolivas/lrzip.git; cd lrzip

Build

./autogen.sh
./configure
make -j `nproc` # maxes out all cores

Install

Simple 'n Easy™: sudo make install

lrzip 101:

|Command|Result| |------|------| |lrztar directory|An archive directory.tar.lrz compressed with LZMA.| |lrzuntar directory.tar.lrz|A directory extracted from a lrztar archive.| |lrzip filename|An archive filename.lrz compressed with LZMA, meaning slow compression and fast decompression.| |lrzip -z filename|An archive "filename.lrz" compressed with ZPAQ that can give extreme compression, but takes a bit longer than forever to compress and decompress.| |lrzip -l filename|An archive lightly compressed with LZO, meaning really, really fast compression and decompression.| |lrunzip filename.lrz|Decompress filename.lrz to filename.| |lrz filename|As per lrzip above but with gzip compatible semantics (i.e. will be quiet and delete original file) |lrz -d filename.lrz|As per lrunzip above but with gzip compatible semantics (i.e. will be quiet and delete original file)

lrzip internals

lrzip uses an extended version of rzip which does a first pass long distance redundancy reduction. lrzip's modifications allow it to scale to accommodate various memory sizes.

Then, one of the following scenarios occurs:

Compressed
(default) LZMA gives excellent compression @ ~2x the speed of bzip2
ZPAQ gives extreme compression while taking forever
LZO gives insanely fast compression that can actually be faster than simply copying a large file
GZIP gives compression almost as fast as LZO but with better compression
BZIP2 is a defacto linux standard and hacker favorite which usually gives quite good compression (ZPAQ>LZMA>BZIP2>GZIP>LZO) while staying fairly fast (LZO>GZIP>BZIP2>LZMA>ZPAQ); in other words, a good middle-ground and a good choice overall
Uncompressed, in the words of the software's original author:

Leaving it uncompressed and rzip prepared. This form improves substantially any compression performed on the resulting file in both size and speed (due to the nature of rzip preparation merging similar compressible blocks of data and creating a smaller file). By "improving" I mean it will either speed up the very slow compressors with minor detriment to compression, or greatly increase the compression of simple compression algorithms.

(Con Kolivas, from the original lrzip README)

The only real disadvantages:

The main program, lrzip, only works on single files, and therefore requires the use of an lrztar wrapper to fake a complete archiver.
lrzip requires quite a bit of memory along with a modern processor to get the best performance in reasonable time. This usually means that it is somewhat unusable with less than 256 MB. However, decompression usually requires less RAM and can work on less powerful machines with much less RAM. On machines with less RAM, it may be a good idea to enable swap if you want to keep your operating system happy.
Piping output to and/or from STDIN and/or STDOUT works fine with both compression and decompression, but larger files compressed this way will likely end up being compressed less efficiently. Decompression doesn't really have any issues with piping, though.

One of the more unique features of lrzip is that it will try to use all of the available RAM as best it can at all times to provide maximum benefit. This is the default operating method, where it will create and use the single largest memory window that will still fit in available memory without freezing up the system. It does this by mmaping the small portions of the file that it is working on. However, it also has a unique "sliding mmap" feature, which allows it to use compression windows that far exceed the size of your RAM if the file you are compressing is large. It does this by using one large mmap along with a smaller moving mmap buffer to track the part of the file that is currently being examined. From a higher level, this can be seen as simply emulating a single, large mmap buffer. The unfortunate thing about this feature is that it can become extremely slow. The counter-argument to being slower is that it will usually give a better compression factor.

The file doc/README.benchmarks has some performance examples to show what kind of data lrzip is good with.

FAQ

Q: What kind of encryption does lrzip use?

A: lrzip uses SHA2-512 repetitive hashing of the password along with a salt to provide a key which is used by AES-128 to do block encryption. Each block has more random salts added to the block key. The amount of initial hashing increases as the timestamp goes forward, in direct relation to Moore's law, which means that the amount of time required to encrypt/decrypt the file stays the same on a contemporary computer. It is virtually guaranteed that the same file encrypted with the same password will never be the same twice. The weakest link in this encryption mode by far is the password chosen by the user. There is currently no known attack or backdoor for this encryption mechanism, and there is absolutely no way of retrieving your password should you forget it.

Q: How do I make a static build?

A: ./configure --enable-static-bin

Q: I want the absolute maximum compression I can possibly get, what do I do?

A: Try the command line options "-Uzp 1 -L 9". This uses all available ram and ZPAQ compression, and even uses a compression window larger than you have ram. The -p 1 option disables multithreading which improves compression but at the expense of speed. Expect it to take many times longer.

Q: I want the absolute fastest decent compression I can possibly get.

A: Try the command line option -l. This will use the lzo backend compression, and level 7 compression (1 isn't much faster).

Q: How much slower is the unlimited mode?

A: It depends on 2 things. First, just how much larger than your ram the file is, as the bigger the difference, the slower it will be. The second is how much redundant data there is. The more there is, the slower, but ultimately the better the compression. Why isn't it on by default? If the compression window is a LOT larger than ram, with a lot of redundant information it can be drastically slower. I may revisit this possibility in the future if I can make it any faster.

Q: Can I use your tool for even more compression than lzma offers?

A: Yes, the rzip preparation of files makes them more compressible by most other compression technique I have tried. Using the -n option will generate a .lrz file smaller than the original which should be more compressible, and since it is smaller it will compress faster than it otherwise would have.

Q: 32bit?

A: 32bit machines have a limit of 2GB sized compression windows due to userspace limitations on mmap and malloc, so even if you have much more ram you will not be able to use compression windows larger than 2GB. Also you may be unable to decompress files compressed on 64bit machines which have used windows larger than 2GB.

Q: How about 64bit?

A: 64bit machines with their ability to address massive amounts of ram will excel with lrzip due to being able to use compression windows limited only in size by the amount of physical ram.

Q: Other operating systems?

A: The code is POSIXy with GNU extensions. Patches are welcome. Version 0.43+ should build on MacOSX 10.5+

Q: Does it work on stdin/stdout?

A: Yes it does. Compression and decompression work well to/from STDIN/ST

Lrzip

Install / Use

README