Smhasher3
Mirror of SMHasher3 repo from official Gitlab site. Submit all issues and PRs there, not on Github!
Install / Use
/learn @fwojcik/Smhasher3README
_____ __ __ _ _ _ ____
/ ____| \/ | | | | | | |___ \
| (___ | \ / | |__| | __ _ ___| |__ ___ _ __ __) |
\___ \| |\/| | __ |/ _` / __| '_ \ / _ \ '__|__ <
____) | | | | | | | (_| \__ \ | | | __/ | ___) |
|_____/|_| |_|_| |_|\__,_|___/_| |_|\___|_| |____/
=======================================================
Test Results
If you are interested in the latest hash test results
(currently from SMHasher3 SMHasher3 release-), they are in the
results/ directory.
Summary
SMHasher3 is a tool for testing the quality of hash functions in terms of their distribution, collision, and performance properties. It constructs sets of hash keys, passes them through the hash function to test, and analyzes their outputs in numerous ways. It also does some performance testing of the hash function.
SMHasher3 is based on the SMHasher fork maintained by Reini Urban, which is in turn based on the original SMHasher by Austin Appleby. The commit history of both of those codebases up to their respective fork points is contained in this repository.
The major differences from rurban's fork are:
- Fix several critical bugs
- Several new tests and test methods added
- Significant performance increases
- Report on p-values for all supported tests
- Detailed reporting on hashes when test failures occur
- Better statistical foundations for some tests
- Overhauled all hash implementations to be more consistent
Additional significant changes include:
- Many fixes to threaded testing and hashing
- More consistent testing across systems and configurations
- More consistent and human-friendlier reporting formats
- Common framework code explicitly sharable across all hashes
- Flexible metadata system for both hashes and their implementations
- Major progress towards full big-endian support
- Support of more hash seed methods (64-bit seeds and ctx pointers)
- Ability to supply a global seed value for testing
- Test of varying alignments and buffer tail sizes during speed tests
- Refactored code to improve maintainability and rebuild times
- Reorganized code layout to improve readability
- Compilation-based platform probing and configuration
- Consistent code formatting
- More explicit license handling
- Fully C++11-based implementation
Current status
As of 2025-10-16, I consider SMHasher3 to have been fully released.
From this point, the plan is to have two branches: "main" and "dev". The main branch will have new hashes and updated hashes added to it as I am able. The dev branch will have those changes added to it also. Feature development will happen only on the dev branch, and those changes will occasionally get added to main, when some chunk of functionality is complete.
There won't be explicit release versioning. Instead, the version string has been updated to include the commit date of the last commit.
This code is compiled and run successfully on Linux x64, arm, and powerpc using gcc and clang quite often. Importantly, I do not have the ability to test on Mac or Windows environments. It has been compiled successfully using MSVC and clang-cl in the past; efforts are made to ensure this remains the case, but some things may slip through. The goal is to support all of the above, and while the CMake files Should Just Work(tm), MSVC in particular has its own ideas about some corners of the various specs. So reports of success or failure are appreciated, as are patches to make things work.
How to build
mkdir buildcd buildcmake ..orCC=mycc CXX=mycxx CXXFLAGS="-foo bar" cmake ..as needed for your systemmake -j4ormake -j4 all test
How to use
./SMHasher3 --testswill show all available test suites./SMHasher3 --listwill show all available hashes and their descriptions./SMHasher3 <hashname>will test the given hash with the default set of test suites (which is called "All" and is most but not literally all of them)./SMHasher3 <hashname> --extra --notest=Speed,Hashmapwill test the given hash with the default set of test suites excluding the Speed and Hashmap tests, with each run test suite using an extended set of tests./SMHasher3 <hashname> --ncpu=1will test the given hash with the default set of test suites, using only a single thread./SMHasher3 --helpwill show many other usage options
Note that a hashname specified on the command-line is looked up via case-insensitive search, so you do not have to precisely match the names given from the list of available hashes. Even fuzzier name matching is planned for future releases.
If SMHasher3 found a usable threading implementation during the build, then
the default is to assume --ncpu=4, which uses up to 4 threads to speed up
testing. Not all test suites use threading. While all included hashes are
thread-safe as of this writing, if a non-thread safe hash is detected then
threading will be disabled and a warning will be given. If no usable
threading library was found, then a warning will be given if a --ncpu=
value above 1 was used.
Adding a new hash
To add a new hash function to be tested, either add the implementation to an existing
file in hashes/ (if related hashes are already there), or copy hashes/EXAMPLE.cpp
to a new filename and then add it to the list of files in hashes/Hashsrc.cmake.
Many more details can be found in hashes/README.addinghashes.md.
P-value reporting
This section has been placed near the front of the README because it is the most important and most visible new feature for existing SMHasher users.
The tests in the base SMHasher code had a variety of metrics for reporting results, and those metrics often did not take the test parameters into account well (or at all), leading to results that were questionable and hard to interpret. For example, the Avalanche test reports on the worst result over all test buckets, but tests with longer inputs have more buckets. This was not part of the result calculation, and so longer inputs naturally get higher percentage biases (on average) even with truly random hashes. In other words, a bias of "0.75%" on a 32-bit input was not the same as a bias of "0.75%" on a 1024-bit input. This is not to call out the Avalanche test specifically; many tests exhibited some variation of this problem.
To address these issues, SMHasher3 tests compare aspects of the distribution of hash values from the hash function against those from a hypothetical true random number generator, and summarizes the result in the form of a p-value.
P-values are probabilities: they are numbers between 0 and 1. Their values are approximately the probability of a true RNG producing a test result that was at least as bad as the observed result from the hash function. Smaller p-values would indicate worse hash results.
However, these p-values quite often end up being very small values near zero, even in cases of good results. Reporting them in their decimal form, or even in scientific notation, would probably not be very useful, and could be very difficult to compare or interpret just by looking at them.
In SMhasher3, these p-values are reported by a caret symbol (^) followed by the p-value expressed in negative powers of two. For example, if it is determined that a true RNG would be expected to produce the same or a worse result with a probability of 0.075, then SMHasher3 would compute that that p-value is about 2^-3.737. It would then round the exponent towards zero, simply discard the sign (since probabilities are never greater than 1, the exponent is always negative), and finally report the p-value as "^ 3".
Therefore, smaller p-values (which indicate worse test results) result in larger numbers when reported using caret notation. You can think of the values in caret notation as indicating how improbable, and thus worse, the test result was. For example, "^50" could be interpreted as "there is, at best, only a 1 in 2^50 chance that an RNG would have produced a result as bad as the hash did".
The p-value computations only care about the likelihood of bad results (e.g. more collisions than an RNG would produce). Test results that are better than a typical RNG result but would still be outliers from a purely statistical point-of-view, such as seeing no or very few collisions when at least some would be expected, do not produce extreme p-values. In statistics terms, the p-values are one-tailed when appropriate, instead of always being two-tailed.
The p-value computations also take into account how many tests are being summarized, which can lead to unintuitive results. As an example, here are some lines from a single batch of test keys:
Keyset 'Sparse' - 256-bit keys with up to 3 bits set - 2796417 keys
Testing all collisions (high 32-bit) - Expected 910.2, actual 989 (1.087x) (^ 7)
Testing all collisions (high 20..38 bits) - Worst is 32 bits: 989/910 (1.087x) (^ 3)
The middle line reports ^7 for seeing 989 collisions when 910 were expected, and the last line reports ^3 for what seems like the same result. This is due to the fact that the middle line is reporting that as the result of a single test, and the last line is reporting that as the worst result over 19 tests. It's much more likely to see a result at least that bad if you have 19 tries to get it than if you just had 1 try, and so the improbability is much lower. Indeed, 19 is around 2^4, and the first reported result is about 4 powers of 2 worse than the second (7 - 3), as expected.
A true RNG would generally have about twice as many ^4 results as ^5 results, and twice as many ^3 results as ^4 results, and so on. Howe
Related Skills
node-connect
345.9kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
106.4kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
345.9kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
345.9kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
