Concryptor

A multi-threaded AEAD encryption engine built in Rust. Encrypts and decrypts files at gigabyte-per-second throughput using a triple-buffered io_uring pipeline, parallel chunk processing via Rayon, and assembly-optimized ciphers via ring.

⚠️ DISCLAIMER: EXPERIMENTAL SOFTWARE ⚠️

This project is extremely new and currently NOT recommended for production or mission-critical use. While the cryptographic primitives (AES-256-GCM, ChaCha20-Poly1305 via ring) and format design are sound, the codebase has not undergone formal security audits or extensive real-world testing. Use at your own risk. For protecting sensitive data, consider using battle-tested tools like GnuPG, age, or OpenSSL until this project matures.

Features

Dual cipher support: AES-256-GCM (hardware AES-NI) and ChaCha20-Poly1305 via ring (assembly-optimized)
Parallel encryption: Rayon-based multi-threaded chunk processing across all CPU cores
Triple-buffered io_uring pipeline: Overlaps kernel I/O and CPU-side crypto using three rotating buffer pools — while one batch's writes are in-flight, the next batch is being encrypted by Rayon, and the third batch's reads are in-flight. No syscall-per-chunk overhead, no mmap limitations (no SIGBUS, no virtual address space exhaustion)
Argon2id key derivation: Industry-standard password-to-key stretching (default 256 MiB memory, 3 iterations, configurable via --memory)
Self-describing KDF parameters: Memory cost, iterations, and parallelism are stored in the encrypted file header so decryption uses exactly the parameters that were chosen at encryption time. Legacy files (all-zero sentinel) are handled transparently with the old 64 MiB defaults
Chunk-indexed nonces: TLS 1.3-style XOR nonce derivation prevents chunk reordering attacks
Header-authenticated AAD: The full 4 KiB aligned header is included in every chunk's AAD, authenticating all header fields (core, KDF parameters, and reserved bytes) and preventing truncation, header-field manipulation, and reserved-byte smuggling attacks
STREAM-style final chunk: A final-chunk flag in the AAD prevents truncation and append attacks (inspired by the STREAM construction)
Fresh randomness per file: Cryptographically random 16-byte salt and 12-byte base nonce are generated for every encryption, stored in the header
In-place encryption: seal_in_place_separate_tag / open_in_place via ring minimizes allocation in the hot loop
Password zeroization: Keys and passwords are securely wiped from memory after use
O_DIRECT + sector-aligned format: 4 KiB-aligned header and chunk slots enable O_DIRECT I/O, bypassing the kernel page cache for DMA-speed reads/writes on NVMe. Buffer pools use std::alloc with 4096-byte alignment
Directory encryption: Encrypt entire directories as a single encrypted archive. Tar-based packing preserves file names, permissions, timestamps, and directory structure inside the ciphertext. Extraction validates against path traversal and symlink escape attacks
Self-describing file format: Header stores cipher, chunk size, original file size, salt, base nonce, and Argon2id KDF parameters

Performance

Benchmarked with cargo bench (Criterion, 10 samples per measurement). Key derivation is excluded - numbers reflect pure crypto throughput only.

Hardware:

CPU: AMD Ryzen 5 5600X (6c/12t @ 3.7 GHz base)
RAM: 2x 8 GiB DDR4-2666 (dual channel, 16 GiB total)
OS: Linux

Note on I/O: Criterion writes temporary files to /tmp, which on this system is tmpfs (RAM-backed). With O_DIRECT, the kernel cannot use real asynchronous DMA on tmpfs, so these numbers reflect cipher throughput + io_uring overhead without the DMA bypass benefit. On a real Gen4 NVMe drive, O_DIRECT eliminates page-cache double-buffering and enables DMA straight into the aligned buffer pools, which should yield significantly higher throughput.

| File Size | AES-256-GCM Encrypt | ChaCha20 Encrypt | AES-256-GCM Decrypt | ChaCha20 Decrypt | |-----------|--------------------:|------------------:|--------------------:|-----------------:| | 64 KiB | 244 MiB/s | 233 MiB/s | 233 MiB/s | 234 MiB/s | | 1 MiB | 1.08 GiB/s | 882 MiB/s | 1010 MiB/s | 876 MiB/s | | 16 MiB | 1.10 GiB/s | 923 MiB/s | 1.06 GiB/s | 988 MiB/s | | 64 MiB | 984 MiB/s | 935 MiB/s | 988 MiB/s | 973 MiB/s | | 256 MiB | 1.00 GiB/s | 1015 MiB/s | 1.01 GiB/s | 1.02 GiB/s |

Chunk size sweep (AES-256-GCM, 64 MiB file):

| Chunk Size | Throughput | |------------|----------:| | 64 KiB | 1.01 GiB/s | | 256 KiB | 1.05 GiB/s | | 1 MiB | 1.07 GiB/s | | 4 MiB | 988 MiB/s | | 8 MiB | 988 MiB/s | | 16 MiB | 1.00 GiB/s |

Performance characteristics

The engine uses ring (assembly-optimized AES-NI / NEON / ARMv8-CE) for cipher operations and a triple-buffered io_uring pipeline for I/O. Three pre-allocated buffer pools rotate through the pipeline: while pool A's writes are completing in the kernel, pool B is being encrypted by Rayon on the CPU, and pool C's reads are being submitted to the kernel. This overlaps I/O latency with crypto computation.

Why AES-256-GCM is faster than ChaCha20-Poly1305 on small files: ring's AES-GCM backend exploits AES-NI + CLMUL hardware instructions available on x86-64, giving it a hardware advantage over ChaCha20 (which is a software cipher). At larger sizes both ciphers converge to ~1.0 GiB/s, indicating the bottleneck shifts from cipher throughput to I/O submission overhead.

Why peak throughput is at 1-16 MiB, not 256 MiB: Small files (1-16 MiB) have few chunks, so Rayon parallelism is efficient and the working set fits in cache. At 64-256 MiB, the io_uring pipeline is fully active (three batches in flight), but the per-SQE submission and CQE completion overhead scales with chunk count. The triple-buffer design ensures I/O and crypto overlap, partially hiding this cost.

Why ~1.0 GiB/s and not 10+ GiB/s: Modern AES-NI can push 2-4 GiB/s per core. With 12 threads, raw cipher throughput could exceed 10 GiB/s. Three factors explain the gap:

io_uring per-SQE overhead: Each chunk requires a read SQE and a write SQE. With 256 chunks for a 256 MiB file, that's 512 SQEs submitted and 512 CQEs reaped. While io_uring avoids the per-syscall kernel transition cost of pread/pwrite, it still has ring-buffer and memory-barrier overhead per SQE.
Pipeline depth: With PIPELINE_DEPTH=3, only three batches rotate through the pipeline at any time. True steady-state overlap requires at least three batches; files that fit in one or two batches don't benefit from pipelining.
Cache hierarchy effects: The 5600X has 512 KiB L2 per core and 32 MiB shared L3. The default 4 MiB chunk exceeds L2, and a batch of ~21 chunks (84 MiB active working set) far exceeds L3. Smaller chunk sizes (64-256 KiB) show better throughput in the chunk sweep because more of the working set stays in cache.

Buffer lifecycle and safety: Buffer pools are allocated once via std::alloc::alloc_zeroed with Layout::from_size_align(size, 4096) before the io_uring ring is created, and are reused across all pipeline iterations without reallocation. Each encrypted chunk is zero-padded to sector alignment before the O_DIRECT write. The ring is explicitly dropped before the buffer pools, ensuring the kernel never references freed memory (no UAF).

Installation

git clone https://github.com/frogsnot/concryptor.git
cd concryptor
cargo build --release

The binary will be at target/release/concryptor.

Usage

Encrypt

# AES-256-GCM (default), output to myfile.dat.enc
concryptor encrypt myfile.dat

# ChaCha20-Poly1305, custom output path
concryptor encrypt myfile.dat --cipher chacha -o encrypted.enc

# Custom chunk size (in MiB)
concryptor encrypt largefile.iso --chunk-size 8

# Stronger KDF (512 MiB memory cost)
concryptor encrypt secrets.tar --memory 512

# Non-interactive (skips password prompt)
concryptor encrypt myfile.dat -p "password"

Security note: --password / -p passes the password as a CLI argument, which is visible in ps output and shell history. For interactive use, omit it to get the secure hidden prompt. For scripting, prefer clearing history afterward or using a wrapper that reads from a file descriptor.

Encrypt a directory

# Encrypt a directory (auto-detects, produces mydir.tar.enc)
concryptor encrypt mydir/

# With custom cipher and output
concryptor encrypt mydir/ --cipher chacha -o secrets.enc

Directory encryption creates a temporary tar archive (.concryptor-*.tar, 0600 permissions, CSPRNG-named), encrypts it, then auto-deletes the temp file. File names, directory structure, permissions, and timestamps are all inside the encrypted payload.

Decrypt

# Auto-strips .enc extension
concryptor decrypt myfile.dat.enc

# Custom output path
concryptor decrypt encrypted.enc -o restored.dat

# Non-interactive
concryptor decrypt myfile.dat.enc -p "password"

Decrypt and extract a directory

# Decrypt and extract in one step (auto-strips .tar.enc -> directory name)
concryptor decrypt mydir.tar.enc --extract

# Short flag, custom output directory
concryptor decrypt mydir.tar.enc -x -o restored_dir/

Without --extract, decrypting a directory archive produces the intermediate .tar file, which you can inspect or extract manually.

Help

concryptor --help
concryptor encrypt --help
concryptor decrypt --help

File Format

All values are little-endian. The header occupies a full 4 KiB sector; each encrypted chunk slot is padded to the next 4 KiB boundar

Concryptor

Install / Use

README

Concryptor

Features

Performance

Performance characteristics

Installation

Usage

Encrypt

Encrypt a directory

Decrypt

Decrypt and extract a directory

Help

File Format