Concryptor
A gigabyte-per-second, multi-threaded file encryption engine. Achieves extreme throughput using a lock-free, triple-buffered io_uring pipeline, Rayon parallel chunking, and hardware-accelerated AEADs (AES-256-GCM / ChaCha20).
Install / Use
/learn @FrogSnot/ConcryptorREADME
Concryptor
A multi-threaded AEAD encryption engine built in Rust. Encrypts and decrypts files at gigabyte-per-second throughput using a triple-buffered io_uring pipeline, parallel chunk processing via Rayon, and assembly-optimized ciphers via ring.
⚠️ DISCLAIMER: EXPERIMENTAL SOFTWARE ⚠️
This project is extremely new and currently NOT recommended for production or mission-critical use. While the cryptographic primitives (AES-256-GCM, ChaCha20-Poly1305 via ring) and format design are sound, the codebase has not undergone formal security audits or extensive real-world testing. Use at your own risk. For protecting sensitive data, consider using battle-tested tools like GnuPG, age, or OpenSSL until this project matures.
Features
- Dual cipher support: AES-256-GCM (hardware AES-NI) and ChaCha20-Poly1305 via
ring(assembly-optimized) - Parallel encryption: Rayon-based multi-threaded chunk processing across all CPU cores
- Triple-buffered io_uring pipeline: Overlaps kernel I/O and CPU-side crypto using three rotating buffer pools — while one batch's writes are in-flight, the next batch is being encrypted by Rayon, and the third batch's reads are in-flight. No syscall-per-chunk overhead, no mmap limitations (no SIGBUS, no virtual address space exhaustion)
- Argon2id key derivation: Industry-standard password-to-key stretching (default 256 MiB memory, 3 iterations, configurable via
--memory) - Self-describing KDF parameters: Memory cost, iterations, and parallelism are stored in the encrypted file header so decryption uses exactly the parameters that were chosen at encryption time. Legacy files (all-zero sentinel) are handled transparently with the old 64 MiB defaults
- Chunk-indexed nonces: TLS 1.3-style XOR nonce derivation prevents chunk reordering attacks
- Header-authenticated AAD: The full 4 KiB aligned header is included in every chunk's AAD, authenticating all header fields (core, KDF parameters, and reserved bytes) and preventing truncation, header-field manipulation, and reserved-byte smuggling attacks
- STREAM-style final chunk: A final-chunk flag in the AAD prevents truncation and append attacks (inspired by the STREAM construction)
- Fresh randomness per file: Cryptographically random 16-byte salt and 12-byte base nonce are generated for every encryption, stored in the header
- In-place encryption:
seal_in_place_separate_tag/open_in_placeviaringminimizes allocation in the hot loop - Password zeroization: Keys and passwords are securely wiped from memory after use
- O_DIRECT + sector-aligned format: 4 KiB-aligned header and chunk slots enable
O_DIRECTI/O, bypassing the kernel page cache for DMA-speed reads/writes on NVMe. Buffer pools usestd::allocwith 4096-byte alignment - Directory encryption: Encrypt entire directories as a single encrypted archive. Tar-based packing preserves file names, permissions, timestamps, and directory structure inside the ciphertext. Extraction validates against path traversal and symlink escape attacks
- Self-describing file format: Header stores cipher, chunk size, original file size, salt, base nonce, and Argon2id KDF parameters
Performance
Benchmarked with cargo bench (Criterion, 10 samples per measurement). Key derivation is excluded - numbers reflect pure crypto throughput only.
Hardware:
- CPU: AMD Ryzen 5 5600X (6c/12t @ 3.7 GHz base)
- RAM: 2x 8 GiB DDR4-2666 (dual channel, 16 GiB total)
- OS: Linux
Note on I/O: Criterion writes temporary files to /tmp, which on this system is tmpfs (RAM-backed). With O_DIRECT, the kernel cannot use real asynchronous DMA on tmpfs, so these numbers reflect cipher throughput + io_uring overhead without the DMA bypass benefit. On a real Gen4 NVMe drive, O_DIRECT eliminates page-cache double-buffering and enables DMA straight into the aligned buffer pools, which should yield significantly higher throughput.
| File Size | AES-256-GCM Encrypt | ChaCha20 Encrypt | AES-256-GCM Decrypt | ChaCha20 Decrypt | |-----------|--------------------:|------------------:|--------------------:|-----------------:| | 64 KiB | 244 MiB/s | 233 MiB/s | 233 MiB/s | 234 MiB/s | | 1 MiB | 1.08 GiB/s | 882 MiB/s | 1010 MiB/s | 876 MiB/s | | 16 MiB | 1.10 GiB/s | 923 MiB/s | 1.06 GiB/s | 988 MiB/s | | 64 MiB | 984 MiB/s | 935 MiB/s | 988 MiB/s | 973 MiB/s | | 256 MiB | 1.00 GiB/s | 1015 MiB/s | 1.01 GiB/s | 1.02 GiB/s |
Chunk size sweep (AES-256-GCM, 64 MiB file):
| Chunk Size | Throughput | |------------|----------:| | 64 KiB | 1.01 GiB/s | | 256 KiB | 1.05 GiB/s | | 1 MiB | 1.07 GiB/s | | 4 MiB | 988 MiB/s | | 8 MiB | 988 MiB/s | | 16 MiB | 1.00 GiB/s |
Performance characteristics
The engine uses ring (assembly-optimized AES-NI / NEON / ARMv8-CE) for cipher operations and a triple-buffered io_uring pipeline for I/O. Three pre-allocated buffer pools rotate through the pipeline: while pool A's writes are completing in the kernel, pool B is being encrypted by Rayon on the CPU, and pool C's reads are being submitted to the kernel. This overlaps I/O latency with crypto computation.
Why AES-256-GCM is faster than ChaCha20-Poly1305 on small files:
ring's AES-GCM backend exploits AES-NI + CLMUL hardware instructions available on x86-64, giving it a hardware advantage over ChaCha20 (which is a software cipher). At larger sizes both ciphers converge to ~1.0 GiB/s, indicating the bottleneck shifts from cipher throughput to I/O submission overhead.
Why peak throughput is at 1-16 MiB, not 256 MiB: Small files (1-16 MiB) have few chunks, so Rayon parallelism is efficient and the working set fits in cache. At 64-256 MiB, the io_uring pipeline is fully active (three batches in flight), but the per-SQE submission and CQE completion overhead scales with chunk count. The triple-buffer design ensures I/O and crypto overlap, partially hiding this cost.
Why ~1.0 GiB/s and not 10+ GiB/s: Modern AES-NI can push 2-4 GiB/s per core. With 12 threads, raw cipher throughput could exceed 10 GiB/s. Three factors explain the gap:
- io_uring per-SQE overhead: Each chunk requires a read SQE and a write SQE. With 256 chunks for a 256 MiB file, that's 512 SQEs submitted and 512 CQEs reaped. While io_uring avoids the per-syscall kernel transition cost of pread/pwrite, it still has ring-buffer and memory-barrier overhead per SQE.
- Pipeline depth: With
PIPELINE_DEPTH=3, only three batches rotate through the pipeline at any time. True steady-state overlap requires at least three batches; files that fit in one or two batches don't benefit from pipelining. - Cache hierarchy effects: The 5600X has 512 KiB L2 per core and 32 MiB shared L3. The default 4 MiB chunk exceeds L2, and a batch of ~21 chunks (84 MiB active working set) far exceeds L3. Smaller chunk sizes (64-256 KiB) show better throughput in the chunk sweep because more of the working set stays in cache.
Buffer lifecycle and safety:
Buffer pools are allocated once via std::alloc::alloc_zeroed with Layout::from_size_align(size, 4096) before the io_uring ring is created, and are reused across all pipeline iterations without reallocation. Each encrypted chunk is zero-padded to sector alignment before the O_DIRECT write. The ring is explicitly dropped before the buffer pools, ensuring the kernel never references freed memory (no UAF).
Installation
git clone https://github.com/frogsnot/concryptor.git
cd concryptor
cargo build --release
The binary will be at target/release/concryptor.
Usage
Encrypt
# AES-256-GCM (default), output to myfile.dat.enc
concryptor encrypt myfile.dat
# ChaCha20-Poly1305, custom output path
concryptor encrypt myfile.dat --cipher chacha -o encrypted.enc
# Custom chunk size (in MiB)
concryptor encrypt largefile.iso --chunk-size 8
# Stronger KDF (512 MiB memory cost)
concryptor encrypt secrets.tar --memory 512
# Non-interactive (skips password prompt)
concryptor encrypt myfile.dat -p "password"
Security note:
--password/-ppasses the password as a CLI argument, which is visible inpsoutput and shell history. For interactive use, omit it to get the secure hidden prompt. For scripting, prefer clearing history afterward or using a wrapper that reads from a file descriptor.
Encrypt a directory
# Encrypt a directory (auto-detects, produces mydir.tar.enc)
concryptor encrypt mydir/
# With custom cipher and output
concryptor encrypt mydir/ --cipher chacha -o secrets.enc
Directory encryption creates a temporary tar archive (.concryptor-*.tar, 0600 permissions, CSPRNG-named), encrypts it, then auto-deletes the temp file. File names, directory structure, permissions, and timestamps are all inside the encrypted payload.
Decrypt
# Auto-strips .enc extension
concryptor decrypt myfile.dat.enc
# Custom output path
concryptor decrypt encrypted.enc -o restored.dat
# Non-interactive
concryptor decrypt myfile.dat.enc -p "password"
Decrypt and extract a directory
# Decrypt and extract in one step (auto-strips .tar.enc -> directory name)
concryptor decrypt mydir.tar.enc --extract
# Short flag, custom output directory
concryptor decrypt mydir.tar.enc -x -o restored_dir/
Without --extract, decrypting a directory archive produces the intermediate .tar file, which you can inspect or extract manually.
Help
concryptor --help
concryptor encrypt --help
concryptor decrypt --help
File Format
All values are little-endian. The header occupies a full 4 KiB sector; each encrypted chunk slot is padded to the next 4 KiB boundar
