Zbackup
ZBackup, a versatile deduplicating backup tool
Install / Use
/learn @zbackup/ZbackupREADME
Introduction
zbackup is a globally-deduplicating backup tool, based on the ideas found in rsync. Feed a large .tar into it, and it will store duplicate regions of it only once, then compress and optionally encrypt the result. Feed another .tar file, and it will also re-use any data found in any previous backups. This way only new changes are stored, and as long as the files are not very different, the amount of storage required is very low. Any of the backup files stored previously can be read back in full at any time. The program is format-agnostic, so you can feed virtually any files to it (any types of archives, proprietary formats, even raw disk images -- but see Caveats).
This is achieved by sliding a window with a rolling hash over the input at a byte granularity and checking whether the block in focus was ever met already. If a rolling hash matches, an additional full cryptographic hash is calculated to ensure the block is indeed the same. The deduplication happens then.
Features
The program has the following features:
- Parallel LZMA or LZO compression of the stored data
- Built-in AES encryption of the stored data
- Possibility to delete old backup data
- Use of a 64-bit rolling hash, keeping the amount of soft collisions to zero
- Repository consists of immutable files. No existing files are ever modified
- Written in C++ only with only modest library dependencies
- Safe to use in production (see below)
- Possibility to exchange data between repos without recompression
Build dependencies
cmake>= 2.8.3 (though it should not be too hard to compile the sources by hand if needed)libssl-devfor all encryption, hashing and random numberslibprotobuf-devandprotobuf-compilerfor data serializationliblzma-devfor compressionliblzo2-devfor compression (optional)zlib1g-devfor adler32 calculation
Quickstart
To build and install:
cd zbackup
cmake .
make
sudo make install
# or just run as ./zbackup
zbackup is also part of the Fedora/EPEL, Debian, Ubuntu, Arch Linux and FreeBSD.
To use:
zbackup init --non-encrypted /my/backup/repo
tar c /my/precious/data | zbackup backup /my/backup/repo/backups/backup-`date '+%Y-%m-%d'`
zbackup restore /my/backup/repo/backups/backup-`date '+%Y-%m-%d'` > /my/precious/backup-restored.tar
If you have a lot of RAM to spare, you can use it to speed-up the restore process -- to use 512 MB more, pass --cache-size 512mb when restoring.
If encryption is wanted, create a file with your password:
# more secure to use an editor
echo mypassword > ~/.my_backup_password
chmod 600 ~/.my_backup_password
Then init the repo the following way:
zbackup init --password-file ~/.my_backup_password /my/backup/repo
And always pass the same argument afterwards:
tar c /my/precious/data | zbackup --password-file ~/.my_backup_password backup /my/backup/repo/backups/backup-`date '+%Y-%m-%d'`
zbackup --password-file ~/.my_backup_password restore /my/backup/repo/backups/backup-`date '+%Y-%m-%d'` > /my/precious/backup-restored.tar
If you have a 32-bit system and a lot of cores, consider lowering the number of compression threads by passing --threads 4 or --threads 2 if the program runs out of address space when backing up (see why below, item 2). There should be no problem on a 64-bit system.
Caveats
- While you can pipe any data into the program, the data should be uncompressed and unencrypted -- otherwise no deduplication could be performed on it.
zbackupwould compress and encrypt the data itself, so there's no need to do that yourself. So just runtar cand pipe it intozbackupdirectly. If backing up disk images employing encryption, pipe the unencrypted version (the one you normally mount). If you create.zipor.rarfiles, use no compression (-0or-m0) and no encryption. - Parallel LZMA compression uses a lot of RAM (several hundreds of megabytes, depending on the number of threads used), and ten times more virtual address space. The latter is only relevant on 32-bit architectures where it's limited to 2 or 3 GB. If you hit the ceiling, lower the number of threads with
--threads. - Since the data is deduplicated, there's naturally no redundancy in it. A loss of a single file can lead to a loss of virtually all data. Make sure you store it on a redundant storage (RAID1, a cloud provider etc).
- The encryption key, if used, is stored in the
infofile in the root of the repo. It is encrypted with your password. Technically thus you can change your password without re-encrypting any data, and as long as no one possesses the oldinfofile and knows your old password, you would be safe (note that ability to change repo type between encrypted and non-encrypted is not implemented yet -- someone who needs this is welcome to create a pull request -- the possibility is all there). Also note that it is crucial you don't lose yourinfofile, as otherwise the whole backup would be lost.
Limitations
- Right now the only modes supported are reading from standard input and writing to standard output. FUSE mounts and NBD servers may be added later if someone contributes the code.
- The program keeps all known blocks in an in-RAM hash table, which may create scalability problems for very large repos (see below).
- The only encryption mode currently implemented is
AES-128inCBCmode withPKCS#7padding. If you believe that this is not secure enough, patches are welcome. Before you jump to conclusions however, read this article. - It's only possible to fully restore the backup in order to get to a required file, without any option to quickly pick it out.
tarwould not allow to do it anyway, but e.g. forzipfiles it could have been possible. This is possible to implement though, e.g. by exposing the data over a FUSE filesystem.
Most of those limitations can be lifted by implementing the respective features.
Safety
Is it safe to use zbackup for production data? Being free software, the program comes with no warranty of any kind. That said, it's perfectly safe for production, and here's why. When performing a backup, the program never modifies or deletes any existing files -- only new ones are created. It specifically checks for that, and the code paths involved are short and easy to inspect. Furthermore, each backup is protected by its SHA256 sum, which is calculated before piping the data into the deduplication logic. The code path doing that is also short and easy to inspect. When a backup is being restored, its SHA256 is calculated again and compared against the stored one. The program would fail on a mismatch. Therefore, to ensure safety it is enough to restore each backup to /dev/null immediately after creating it. If it restores fine, it will restore fine ever after.
To add some statistics, the author of the program has been using an older version of zbackup internally for over a year. The SHA256 check never ever failed. Again, even if it does, you would know immediately, so no work would be lost. Therefore you are welcome to try the program in production, and if you like it, stick with it.
Usage notes
The repository has the following directory structure:
/repo
backups/
bundles/
00/
01/
02/
...
index/
info
- The
backupsdirectory contain your backups. Those are very small files which are needed for restoration. They are encrypted if encryption is enabled. The names can be arbitrary. It is possible to arrange files in subdirectories, too. Free renaming is also allowed. - The
bundlesdirectory contains the bulk of data. Each bundle internally contains multiple small chunks, compressed together and encrypted. Together all those chunks account for all deduplicated data stored. - The
indexdirectory contains the full index of all chunks in the repository, together with their bundle names. A separate index file is created for each backup session. Technically those files are redundant, all information is contained in the bundles themselves. However, having a separateindexis nice for two reasons: 1) it's faster to read as it incurs less seeks, and 2) it allows making backups while storing bundles elsewhere. Bundles are only needed when restoring -- otherwise it's sufficient to only haveindex. One could then move all newly created bundles into another machine after each backup. infois a very important file which contains all global repository metadata, such as chunk and bundle sizes, and an encryption key encrypted with the user password. It is paramount not to lose it, so backing it up separately somewhere might be a good idea. On the other hand, if you absolutely don't trust your remote storage provider, you might consider not storing it with the rest of the data. It would then be impossible to decrypt it at all, even if your password gets known later.
The program does not have any facilities for sending your backup over the network. You can rsync the repo to another computer or use any kind of cloud storage capable of storing files. Since zbackup never modifies any existing files, the latter is especially easy -- just tell the upload tool you use not to upload any files which already exist on the remote si
Related Skills
node-connect
340.5kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
84.2kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
340.5kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
commit-push-pr
84.2kCommit, push, and open a PR
