Prvhash
PRVHASH - Pseudo-Random-Value Hash. Hash functions, PRNG with unlimited period, randomness extractor, and a glimpse into abyss. (header-only C/C++) (Codename Gradilac/Градилак)
Install / Use
/learn @avaneev/PrvhashREADME
PRVHASH - Pseudo-Random-Value Hash (in C/C++)
Introduction
PRVHASH is a hash function that generates a uniform pseudo-random number
sequence
derived from the message. PRVHASH is conceptually similar (in the sense of
using a pseudo-random number sequence as a hash) to keccak
and RadioGatun
schemes, but is a completely different implementation of such concept.
PRVHASH acts as both a "randomness extractor"
and an "extendable-output function" (XOF).
PRVHASH can generate 64- to unlimited-bit hashes, yielding hashes of approximately equal quality independent of the chosen hash length. PRVHASH is based on 64-bit math, which is scalar, portable, cross-platform, inlinable, C++ compatible (fairly efficient on 32-bit systems as well). The function can easily be used beyond 1024-bit hashes, but such cases should be statistically tested (until a rigorous formal extrapolation is available). For example, any 32-bit element extracted from 2048-, or 4096-bit resulting hash is as collision resistant as just a 32-bit hash. It is a constant-time hash function, meaning its execution time depends only on the message's length. A streamed, higher-security, hashing implementation is available.
PRVHASH is solely based on the butterfly effect, inspired by LCG pseudo-random number generators. The generated hashes exhibit strong avalanche properties. For best security, a random seed should be supplied to the hash function, but this is not a requirement.
64-, 128-, 192-, 256-, 512-, and 1024-bit PRVHASH hashes pass all SMHasher tests. Other hash lengths were not thoroughly tested, but extrapolations can be made. The author makes no cryptographic claims (neither positive nor negative) about PRVHASH-based constructs.
PRVHASH core function can be used as a PRNG with an arbitrarily-chosen (practically unlimited) period, depending on the number of hashwords in the system.
PRVHASH64
Please see the prvhash64.h file for the details of the basic hash function
implementation (the prvhash.h, prvhash4.h, prvhash42.h, and versions
below 4.3 are outdated versions). While this hash function is likely
irreversible, according to SAT solver-based testing, it does not provide
second preimage resistance. This function should not be used in open systems
without a secret seed. Note that 64 refers to core function's variable size.
The default prvhash64.h-based 64-bit hash of the string The cat is out of the bag is 6ac39f7ac0c94d63.
A proposed short name for hashes created with prvhash64.h is PRH64-N,
where N is the hash length in bits (e.g., PRH64-256).
PRVHASH64_64M
This is a minimized implementation of the prvhash64 hash function. It is
one of very few smallest hash functions which produces 64-bit hashes of
this quality level. While this function does not provide a bulk throughput
that can be considered "fast", due to its statistical properties it is
fast for short-key hash-maps and hash-tables.
Streamed Hashing
The file prvhash64s.h includes a relatively fast streamed hashing function
which uses a "fused" PRVHASH arrangement. Please take a look at the
prvhash64s_oneshot() function for usage example. The prvhash64s offers
both an increased security and hashing speed.
This function has an increased preimage resistance compared to the basic hash function implementation. Preimage resistance cannot currently be estimated exactly, but the hash length affects it exponentially. Also, second preimage attack usually boils down to exchange of forged symbols to "trash" symbols (at any place of the data stream); substitutions usually result in random values, possibly damaging to any compressed or otherwise structured file. Which means that data compression software and libraries should always check any left-over, "unused", data beyond the valid compressed stream, for security reasons.
The time complexity of a preimage attack is highly variable and appears to follow a random-logarithmic distribution.
Even though a formal proof is not yet available, the author believes this hash function can compete with widely-used SHA2 and SHA3 families of hash functions while at the same time offering a considerably higher performance and scalability. When working in open systems, supplying a secret seed is not a requirement for this hash function.
The performance (expressed in cycles/byte) of this hash function on various platforms can be evaluated at the ECRYPT/eBASH project.
The default prvhash64s.h-based 64-bit hash of the string The cat is out of the bag is 2043ccf52ae2ca6f.
The default prvhash64s.h-based 256-bit hash of the string
Only a toilet bowl does not leak is
b13683799b840002689a1a42d93c826c25cc2d1f1bc1e48dcd005aa566a47ad8.
The default prvhash64s.h-based 256-bit hash of the string
Only a toilet bowl does not leaj is
d4534a922fd4f15ae8c6cc637006d1f33f655b06d60007a226d350e87e866250.
This demonstrates the Avalanche effect. On a set of 216553 English words, pair-wise hash comparisons give a 50.0% average difference in resulting hash bits, which fully satisfies the strict avalanche criterion.
This streamed hash function produces hash values that are different to the
prvhash64 hash function. It is incorrect to use both of these hash function
implementations on the same data set. While prvhash64 can be used as a hash
for hash-tables and in-memory data blocks, prvhash64s can be used to create
hashes of large data blocks like files, in streamed mode.
A proposed short name for hashes created with prvhash64s.h is PRH64S-N,
where N is the hash length in bits (e.g., PRH64S-256). Or simply, SH4-N,
Secure Hash 4.
Minimal PRNG for Everyday Use
The core function can be easily integrated into your applications, to be used
as an effective PRNG. The period of this minimal PRNG is at least
2<sup>159</sup>. The initial parameters can be varied at will, and will not
"break" the PRNG. Setting only the Seed value guarantees a random start
point within the whole PRNG period, with at least 2<sup>64</sup> spacing.
It is generally recommended to initialize lcg and Hash variables to 0.
Since the PRNG is based on multiplication with recurrence, it can be
classified as "chaotic". An extremely small number (just sys_size_bits
which is 192 out of 2<sup>64</sup> for this minimal PRNG) of its possible
PRNG cycles have reduced periods, but they are never shorter than
2<sup>hash_bits-hash_count</sup>, which depends on the number of hashwords,
as hashwords are linear accumulators similar to Weyl sequences
(2<sup>63</sup> for this minimal PRNG).
The example code follows:
#include "prvhash_core.h"
#include <stdio.h>
int main()
{
uint64_t Seed = 0;
uint64_t lcg = 0;
uint64_t Hash = 0;
uint64_t v = 0;
uint64_t i;
for( i = 0; i < ( (uint64_t) 1 << 28 ); i++ )
{
v = prvhash_core64( &Seed, &lcg, &Hash );
}
printf( "%llu\n", v );
}
For implementation assurance, here are the first 16 output values in hex (starting from the all-zeroes state):
0x5555555555555555 0x00000000DB6DB6DB 0x2492492192492492 0x75D75DA0AAAAAA79
0x93064E905C127FE5 0xE2585C9CA95671A3 0x28A44B31D428179E 0x11B0B6A8D4BA3A73
0x195C6A4C23EE71AD 0x5AA47859226BA23E 0xA7D42121695056D4 0x142D7CD5D83342F2
0x3D42E83328C09C8F 0x7E691C66BAC23222 0x82E1032F441F23A5 0xA4BDE5C4A05E6256
Note that such minimal 1-hashword PRNG is definitely not cryptographically secure: its state can be solved by a SAT solver pretty fast; this applies to other arrangements ("fused", "parallel", multiple hashwords; with daisy-chaining being harder to solve). The known way to make PRNG considerably harder to solve for a SAT solver, with complexity corresponding to system's size, is to combine two adjacent PRNG outputs via XOR operation; this obviously has a speed impact and produces output with more than 1 solution (most probably, 2). This, however, does not measurably increase the probability of PRNG output overlap, which stays below 1/2<sup>sys_size_bits</sup>; in tests, practically undetectable.
So, the basic PRNG with some, currently not formally proven, security is as follows (XOR two adjacent outputs to produce a single "compressed" PRNG output):
v = prvhash_core64( &Seed, &lcg, &Hash );
v ^= prvhash_core64( &Seed, &lcg, &Hash );
A similar approach is to simply skip the next generated random number, but it is slightly less secure. It is likely that PRVHASH's k-equidistribution of separate outputs is implicitly secure. The reason is that skipping or XORing creates uncertainty or entanglement of current output with system's state "hashword array length" of outputs back. 3 XORs are needed to provide preimage resistance, or resistance against selection of entropy input that leads to a desired output. The security becomes effective only after system's initialization: initial "conditioning" rounds and a full hashword array pass.
TPDF Dithering
The core function can be used to implement a "statistically-good" and "neutrally-sounding" dithering noise for audio signals; for both floating-point to fixed-point, and bit-depth conversions.
uint64_t rv = prvhash_core64( &Seed, &lcg, &Hash );
double tpdf = ( (int64_t) (uint32_t) rv - (int64_t) ( rv >> 32 )) * 0x1p-32;
Floating-Point PRNG
The following expression can be used to convert 64-bit unsigned value to full-mantissa floating-point value, without a truncation bias:
uint64_t rv = prvhash_core64( &Seed, &lcg, &Hash );
double v = ( rv >> ( 64 - 53 )) * 0x1p-53;
Gradilac PRNG (C++)
The gradilac.h file includes the
