Gdelta
Fast delta compression algorithm for similar data chunks in pure Rust. Achieves 496 MB/s encoding and 4.4 GB/s decoding with competitive compression ratios.
Install / Use
/learn @ImGajeed76/GdeltaREADME
gdelta
A fast delta compression algorithm for similar data chunks, implemented in pure Rust.
Overview
gdelta is a Rust implementation of the GDelta algorithm by Haoliang Tan. It provides efficient delta encoding for
similar data chunks (typically 4KB - 64KB) commonly found in deduplication systems.
Key Features:
- Fast delta encoding and decoding with optional SIMD optimization
- Memory-safe implementation in pure Rust
- Simple, ergonomic API with full-featured CLI tool
- No unsafe code
- Thoroughly tested with comprehensive benchmarking suite
Performance:
Synthetic benchmarks (2,097 tests):
- Encoding: 496 MB/s average throughput
- Small chunks (16KB): up to 1.08 GiB/s
- Large data (256KB+): 370-400 MiB/s sustained
- Decoding: 4.4 GB/s average (9x faster than encoding)
- Peak: 10.6 GiB/s on binary data
- Compression: 68% space saved (raw delta)
- With zstd: 75% space saved, 305 MB/s
- With lz4: 71% space saved, 430 MB/s
Real-world git repositories (1.36M comparisons):
- mdn/content: 97.0% space saved, 4µs encode, 0µs decode
- tokio: 94.5% space saved, 5µs encode, 0µs decode
Benchmarked on AMD Ryzen 7 7800X3D with 16 cores (Fedora Linux 42). See PERFORMANCE.md for detailed analysis including comparisons with xpatch, vcdiff, qbsdiff, and zstd_dict.
Installation
As a Library
Add this to your Cargo.toml:
[dependencies]
gdelta = "0.2"
As a CLI Tool
Install using cargo:
cargo install gdelta --features cli
Or build from source:
git clone https://github.com/ImGajeed76/gdelta
cd gdelta
cargo build --release --features cli
# Binary at: target/release/gdelta
Usage
Library API
use gdelta::{encode, decode};
// Create base data and modified version
let base_data = b"Hello, World! This is some base data.";
let new_data = b"Hello, Rust! This is some modified data.";
// Encode the delta
let delta = encode(new_data, base_data)?;
println!("Delta size: {} bytes", delta.len());
// Decode to recover the new data
let recovered = decode(&delta, base_data)?;
assert_eq!(recovered, new_data);
CLI Tool
Use
gdelta helpto see the most up-to-date options and descriptions.
The CLI provides a simple interface for creating and applying delta patches:
Create a delta patch:
# Basic usage
gdelta encode old_file.bin new_file.bin -o patch.delta
# With compression (recommended)
gdelta encode old_file.bin new_file.bin -o patch.delta -c zstd
gdelta encode old_file.bin new_file.bin -o patch.delta -c lz4
# With verification
gdelta encode old_file.bin new_file.bin -o patch.delta --verify
Apply a delta patch:
# Auto-detects compression format
gdelta decode old_file.bin patch.delta -o new_file.bin
# Force specific format (if magic bytes conflict)
gdelta decode old_file.bin patch.delta -o new_file.bin --format zstd
Options:
-c, --compress <FORMAT>- Compression: none, zstd, lz4 (default: none)-v, --verify- Verify delta after creation (encode only)-y, --yes- Skip memory warning prompts-f, --force- Overwrite existing files-q, --quiet- Suppress output except errors
Example workflow:
# Create compressed delta
gdelta encode database-v1.db database-v2.db -o update.delta -c zstd
# Later, apply the update
gdelta decode database-v1.db update.delta -o database-v2-restored.db
# Verify the result
diff database-v2.db database-v2-restored.db
Memory Management:
The CLI monitors memory usage and warns when operations might use >80% of available RAM. This is important because gdelta loads entire files into memory.
# For large files, the tool will prompt:
⚠ Memory warning: This operation requires ~12.4 GB
Available: 8.2 GB free (16 GB total)
Continue? [y/N]:
Use -y to skip prompts in automated scripts.
How It Works
GDelta uses:
- GEAR Rolling Hash - Fast fingerprinting for chunk boundaries
- Variable-Length Integer Encoding - Efficient space utilization
- Copy/Literal Instructions - Minimal delta representation
- Prefix/Suffix Matching - Optimized for common data patterns
The algorithm identifies matching regions between base and new data, then encodes only the differences as a series of copy and literal instructions.
Algorithm Parameters
The implementation uses optimized default parameters:
- Chunk size: 300 KB
- Word size: 8 bytes
- Base sample rate: 3
- Features: Skip optimization, reverse matching
These parameters are tuned for typical deduplication workloads.
Comparison with Other Algorithms
Synthetic Workloads (2,097 tests)
| Algorithm | Speed | Compression | Memory | Use Case | |-----------------|----------|-------------|--------|-------------------------------| | gdelta | 496 MB/s | 68% | Low | Maximum speed | | gdelta+lz4 | 430 MB/s | 71% | Low | Fast with compression | | gdelta+zstd | 305 MB/s | 75% | Low | Balanced speed/compression | | xpatch | 306 MB/s | 75% | Low | Automatic algorithm selection | | vcdiff | 94 MB/s | 64% | Medium | Standard delta format | | qbsdiff | 22 MB/s | 84% | Medium | Maximum compression | | zstd_dict | 14 MB/s | 55% | Medium | Dictionary-based compression |
Git Repository Workloads (1.36M comparisons)
mdn/content repository (306K comparisons):
| Algorithm | Compression | Encode Time | Notes | |---------------|-------------|-------------|------------------| | xpatch (tags) | 97.5% saved | 319 µs | Best compression | | xpatch (seq) | 97.5% saved | 14 µs | Fast alternative | | gdelta | 97.0% saved | 4 µs | Fastest | | vcdiff | 95.6% saved | 49 µs | Standard format |
tokio repository (33K comparisons):
| Algorithm | Compression | Encode Time | Notes | |---------------|-------------|-------------|------------------| | xpatch (tags) | 97.9% saved | 306 µs | Best compression | | xpatch (seq) | 95.6% saved | 24 µs | Balanced | | gdelta | 94.5% saved | 5 µs | Fastest | | vcdiff | 93.1% saved | 28 µs | Standard format |
Git benchmark data from xpatch test results. See PERFORMANCE.md for detailed analysis.
Key Observations
Synthetic benchmarks:
- qbsdiff: best compression (84%), slowest speed (22 MB/s)
- gdelta: fastest speed (496 MB/s), competitive compression (68%)
- xpatch/gdelta+zstd: balanced (75% compression, 305 MB/s)
- Algorithm choice impacts compression by 29 percentage points
Git repositories:
- All algorithms achieve 93-98% compression on real file evolution
- xpatch with tag optimization: best compression (97.5-97.9%)
- gdelta: fastest (4-5µs), competitive compression (94.5-97.0%)
- Differences measured in microseconds, all are fast
- For versioned text with micro edits (docs, configs, source code), xpatch is generally the better fit — tag optimization finds better base versions across history, yielding up to 97.9% compression with acceptable overhead (~20µs per comparison)
Workload matters: Performance characteristics vary significantly between synthetic edits and real file evolution patterns.
Use Cases
High-Speed Applications:
- Deduplication systems: gdelta (496 MB/s)
- Real-time processing: gdelta (4.4 GB/s decode)
- Database replication: gdelta or gdelta+lz4
Version Control Systems:
- Recommended for versioned text: xpatch with tags (97.5-97.9% on real repos)
- Speed-critical: gdelta (4-5µs per operation, 94.5-97.0% on real repos)
- Balanced: xpatch sequential mode or gdelta+lz4
Backup & Storage:
- Space-critical: qbsdiff (84% saved)
- Balanced: gdelta+zstd (75% saved, 305 MB/s)
- Fast backups: gdelta or gdelta+lz4
Network Synchronization:
- Low bandwidth: qbsdiff or xpatch
- Low latency: gdelta (4.4 GB/s decode)
- Balanced: gdelta+zstd
Standards Compliance:
- RFC 3284 (VCDIFF): vcdiff implementation
Development
Running Tests
# Run unit tests
cargo test
# Run integration tests
cargo test --test '*'
# Run CLI test suite
./test_gdelta.sh
# Run simple benchmarks (quick verification)
cargo bench --bench simple
# Run comprehensive benchmarks (15-30 min)
cargo bench --bench comprehensive
# Run comprehensive benchmarks with custom filters
BENCH_FORMATS=json,csv BENCH_ALGOS=gdelta,xpatch cargo bench --bench comprehensive
Benchmark Modes
The comprehensive benchmark supports two modes:
# Quick mode (default): smaller sample size, faster
BENCH_MODE=quick cargo bench --bench comprehensive
# Full mode: larger sample size, more accurate
BENCH_MODE=full cargo bench --bench comprehensive
Results are saved to target/benchmark_report_<timestamp>.md and .json.
Credits
This is a Rust implementation of the GDelta algorithm by Haoliang Tan.
Original repositories/resources:
License
This project is licensed under the MIT License - see the LICENSE file for details.
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
Built with ❤️ by Oliver Seifert | [Blog Post](https://oseifert.ch/blog/devto-i-got-obsessed-with-2-byt
Related Skills
node-connect
349.7kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
109.7kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
349.7kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
349.7kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
