SkillAgentSearch skills...

Sha3

Compile-time Evaluable SHA3 in C++: Permutation-based Cryptographic Hashing

Install / Use

/learn @itzmeanjan/Sha3

README

sha3

constexpr SHA3: Keccak Permutation-based Hash and Extendable-Output Functions.

[!NOTE] constexpr? Yes, you can evaluate message digest (MD) of any message, during program compilation time itself, given that the message is known at compile-time. This can be useful if you have to embed a message digest in program as a constant. Instead of hardcoding the MD as constant, the compiler can compute it for you. And you can run a static_assert against the computed value.

Overview

SHA3 standard i.e. NIST FIPS 202, specifies four permutation-based hash functions and two eXtendable Output Functions (XOF), which are built on top of 24-rounds keccak-p[1600, 24] permutation. In IETF RFC 9861, two additional XOFs are defined based on 12-rounds keccak-p[1600, 12] permutation. The round-reduced keccak permutation almost doubles the performance of TurboSHAKE compared to the original SHAKE XOFs.

These hash functions and extendable output functions are commonly used in various post-quantum cryptography algorithms (i.e. those used for public key encryption, key establishment mechanism and digital signature). Some of which are already standardized (e.g. ML-KEM, ML-DSA, SLH-DSA etc.) by NIST, some are waiting to be standardized (e.g. FN-DSA) or some are still competing. We implement SHA3 specification as a header-only fully constexpr C++ library, so that we can use it as a modular CMake dependency in libraries, where we implement various PQC schemes. We follow NIST FIPS 202 @ https://dx.doi.org/10.6028/NIST.FIPS.202 and RFC 9861 @ https://datatracker.ietf.org/doc/rfc9861.

Following algorithms (with flexible interfaces) are implemented in sha3 library.

Algorithm | Input | Output | Behaviour | Namespace + Header --- | :-: | :-: | :-: | --: SHA3-224 | N ( >=0 ) -bytes message | 28 -bytes digest | Given N -bytes input message, this routine computes 28 -bytes sha3-224 digest, while (incrementally) consuming message into Keccak[448] sponge. | sha3_224::sha3_224_t SHA3-256 | N ( >=0 ) -bytes message | 32 -bytes digest | Given N -bytes input message, this routine computes 32 -bytes sha3-256 digest, while (incrementally) consuming message into Keccak[512] sponge. | sha3_256::sha3_256_t SHA3-384 | N ( >=0 ) -bytes message | 48 -bytes digest | Given N -bytes input message, this routine computes 48 -bytes sha3-384 digest, while (incrementally) consuming message into Keccak[768] sponge. | sha3_384::sha3_384_t SHA3-512 | N ( >=0 ) -bytes message | 64 -bytes digest | Given N -bytes input message, this routine computes 64 -bytes sha3-512 digest, while (incrementally) consuming message into Keccak[1024] sponge. | sha3_512::sha3_512_t SHAKE128 | N ( >=0 ) -bytes message | M ( >=0 ) -bytes output | Given N -bytes input message, this routine squeezes arbitrary ( = M ) number of output bytes from Keccak[256] sponge, which has already (incrementally) absorbed input bytes. | shake128::shake128_t SHAKE256 | N ( >=0 ) -bytes message | M ( >=0 ) -bytes digest | Given N -bytes input message, this routine squeezes arbitrary ( = M ) number of output bytes from Keccak[512] sponge, which has already (incrementally) absorbed input bytes. | shake256::shake256_t TurboSHAKE128 | N ( >=0 ) -bytes message | M ( >=0 ) -bytes output | Given N -bytes input message, this routine squeezes arbitrary ( = M ) number of output bytes from Keccak[256] sponge, which has already (incrementally) absorbed input bytes. It is faster than SHAKE128, because it is powered by 12-rounds keccak permutation. | turboshake128::turboshake128_t TurboSHAKE256 | N ( >=0 ) -bytes message | M ( >=0 ) -bytes output | Given N -bytes input message, this routine squeezes arbitrary ( = M ) number of output bytes from Keccak[512] sponge, which has already (incrementally) absorbed input bytes. It is faster than SHAKE256, because it is powered by 12-rounds keccak permutation. | turboshake256::turboshake256_t

XKCP is the state-of-the-art C library implementation, for all common constructions based on keccak permutation. It is available @ https://github.com/XKCP/XKCP. Following screen capture, shows a performance comparison of generic and portable TurboSHAKE128 XOF, implemented in this library, against the baseline of XKCP's generic64 (plain 64-bit C, no platform-specific optimizations) TurboSHAKE128 implementation. To compare performance of TurboSHAKE128, for both short and long messages, we absorb messages of variable length, starting from 32B to 1GB, with a multiplicative jump factor of 32, while squeezing a fixed length output of 64B.

Benchmark comparison shows a clear and consistent trend of this portable implementation taking ~(12-17)% more time compared to XKCP, for messages >= 1KB where the keccak permutation dominates.

[!IMPORTANT] This performance comparison was carried out on a 12th Gen Intel(R) Core(TM) i7-1260P machine, running Linux 6.17.0-6-generic kernel. The benchmark executable was compiled using GCC-15.2.0, with optimization options -O3 -march=native -flto. For XKCP, we built the generic64 target (plain 64-bit C implementation). We used google-benchmark's benchmark comparison tool @ https://github.com/google/benchmark/blob/v1.9.4/docs/tools.md for easily comparing JSON dump.

benchcmp_with_xkcp_turboshake128_on_x86_64.png

You can reproduce this benchmark comparison on your own machine (x86-64 or ARM64) by running the self-contained script in benchcmp_xkcp/. It clones XKCP, builds its generic64 library, compiles both benchmarks into a single executable, runs them, and produces a comparison table.

cd benchcmp_xkcp
bash run_benchmarks.sh

[!NOTE] The script requires git, cmake, make, xsltproc, python3 (with python3-venv) and a C/C++ compiler with C++20 support. Python dependencies (scipy, numpy) are automatically installed in a virtual environment by the script. You should disable CPU frequency scaling during benchmarking, following this guide.

Ideally, one should just go ahead with XKCP. Though, one edge this portable implementation of sha3 has over XKCP - it is fully constexpr. In following code snippet, we compute SHA3-256 message digest (MD) of a compile-time known string, in program compilation time itself. The MD is checked using a static assertion. This header-only lightweight library allows you to embed SHA3-256 MD as a constant, which is computed by the compiler. It's not hard-coded.

#include "sha3/sha3_256.hpp"
#include <array>
#include <cstdint>
#include <cstdlib>
#include <string_view>

// Compile-time hex character to nibble conversion.
constexpr uint8_t
hex_digit(char chr)
{
  if (chr >= '0' && chr <= '9') return static_cast<uint8_t>(chr - '0');
  if (chr >= 'a' && chr <= 'f') return static_cast<uint8_t>(chr - 'a' + 10);
  if (chr >= 'A' && chr <= 'F') return static_cast<uint8_t>(chr - 'A' + 10);
  return 0;
}

// Given a hex encoded string of length 2*L, parse it as a byte array of length L.
template<size_t L>
constexpr std::array<uint8_t, L>
from_hex(std::string_view hex)
{
  std::array<uint8_t, L> res{};
  for (size_t i = 0; i < L; i++) {
    res[i] = static_cast<uint8_t>((hex_digit(hex[2 * i]) << 4) | hex_digit(hex[(2 * i) + 1]));
  }
  return res;
}

// Given a string_view, interpret its raw bytes as a uint8_t array.
template<size_t L>
constexpr std::array<uint8_t, L>
string_to_bytes(std::string_view sv)
{
  std::array<uint8_t, L> arr{};
  for (size_t i = 0; i < sv.size(); i++) {
    arr[i] = static_cast<uint8_t>(sv[i]);
  }
  return arr;
}

constexpr std::string_view MSG = "keccak permutation rocks!";
constexpr auto MSG_bytes = string_to_bytes<MSG.size()>(MSG);
constexpr auto MD = sha3_256::sha3_256_t::hash(MSG_bytes);

int
main()
{
  constexpr auto expected_md = from_hex<sha3_256::DIGEST_LEN>("4edc60a9ffe739ce44252716483a529e8a859a5a75cbf69d494037e914bac16b");
  static_assert(MD == expected_md, "Must compute SHA3-256 message digest in program compile-time!");

  return EXIT_SUCCESS;
}

Prerequisites

  • A C++ compiler such as g++/ clang++, with support for C++20 standard library.
  • CMake (>= 3.28).
  • google-test and google-benchmark can either be installed system-wide, or fetched automatically by CMake (pass -DSHA3_FETCH_DEPS=ON).

[!NOTE] If you are on a machine running GNU/Linux kernel and you want to obtain CPU cycles or Cycles/ byte or instruction/ cycle etc., when benchmarking SHA3 algorithms, you should consider building google-benchmark library yourself with libPFM support, following the step-by-step guide @ https://gist.github.com/itzmeanjan/05dc3e946f635d00c5e0b21aae6203a7. Find more about libPFM @ https://perfmon2.sourceforge.net.

Testing

For ensuring that SHA3 hash function and extendable output function implementations are correct & conformant to the NIST FIPS 202, we make use of K(nown) A(nswer) T(ests), generated following the gist @ https://gist.github.com/itzmeanjan/448f97f9c49d781a5eb3ddd6ea6e7364. For TurboSHAKE, we use test vectors defined in IETF RFC 9861.

We also test correctness of

  • Incremental message absorption property of SHA3 hash functions and XOFs.
  • Incremental output squeezing property of SHA3 XOFs.

Some compile-time executed tests ( using static_assert ) are also implemented, which ensure that all SHA3 hash functions and XOFs are constexpr - meaning they can be evaluated during compilation-time for any statically defined input message.

Issue following commands for running all the test cases.

cmake -B build -
View on GitHub
GitHub Stars16
CategoryDevelopment
Updated24d ago
Forks2

Languages

C++

Security Score

95/100

Audited on Mar 5, 2026

No findings