StableHashTraits.jl
Compute hashes over any Julia object simply and reproducibly
Install / Use
/learn @beacon-biosignals/StableHashTraits.jlREADME
StableHashTraits
The aim of StableHashTraits is to make it easy to compute a stable hash of any Julia value with minimal boilerplate using trait-based dispatch; here, "stable" means the value will not change across Julia versions (or between Julia sessions).
For example:
<!--The START_ and STOP_ comments are used to extract content that is also repeated in the documentation--> <!--START_EXAMPLE-->using StableHashTraits
using StableHashTraits: Transformer
using Dates
struct MyType
data::Vector{UInt8}
metadata::Dict{Symbol, Any}
end
# ignore `metadata`, `data` will be hashed using fallbacks for `AbstractArray` type
StableHashTraits.transformer(::Type{<:MyType}) = Transformer(pick_fields(:data))
# NOTE: `pick_fields` is a helper function implemented by `StableHashTraits`
# it creates a named tuple with the given object fields; in the above code it is used
# in its curried form e.g. `pick_fields(:data)` is the same as `x -> pick_fields(x, :data)`
a = MyType(read("myfile.txt"), Dict{Symbol, Any}(:read => Dates.now()))
b = MyType(read("myfile.txt"), Dict{Symbol, Any}(:read => Dates.now()))
stable_hash(a; version=4) == stable_hash(b; version=4) # true
<!--END_EXAMPLE-->
In many cases, users can simply call stable_hash(x; version=4) on the type they want to hash. However, a method of transformer can be used to customize how an object is hashed. It should dispatch on the type to be transformed, and return a function wrapped in Transformer. During hashing, this function is called and its result is the value that is actually hashed.
StableHashTraits aims to guarantee a stable hash so long as you only upgrade to non-breaking versions (e.g. StableHashTraits = "1" in [compat] of Project.toml); any changes in an object's hash in this case would be considered a bug.
<!--The START_ and STOP_ comments are used to extract content that is also repeated in the documentation--> <!--START_OVERVIEW-->[!NOTE] In versions prior to 2.0, StableHashTraits included hash versions 1-3. These have been removed in version 2.0 of StableHashTraits, and the existing version 4 has been left unchanged, to avoid confusion: calling
stable_hash(x; version=4)will yield the same result, regardless of whether you are using StableHashTraits 1.3 or 2.0. Calls to the earlier hash versions will error in 2.0.
Use Case and Design Rationale
StableHashTraits is designed to be used in cases where there is an object you wish to serialize in a content-addressed cache. How and when objects pass the same input to a hashing algorithm is meant to be predictable and well defined, so that the user can reliably define methods of transformer to modify this behavior.
What gets hashed?
When you call stable_hash(x; version=4), StableHashTraits hashes both the value x and its type T. Rather than hashing the type T itself directly, in most cases instead StructTypes.StructType(T) is hashed, using StructTypes.jl. For example, since the "StructType" of Float64 and Float32 are both NumberType, when hashing Float64 and Float32 values, value and NumberType are hashed. This provides a simple trait-based system that doesn't need to rely on internal details. See below for more details.
You can customize how the value is hashed using StableHashTraits.transformer,
and how its type is hashed using StableHashTraits.transform_type.
If you need to customize either of these functions for a type that you don't own, you can use a @context to avoid type piracy.
StructType.DataType
StructType.DataType denotes a type that is some kind of "record"; i.e. its content is defined by the fields (getfield(f) for f in fieldnames(T)) of the type. Since it is the default, it is used to hash most types.
To hash the value, each field value (getfield(f) for f in fieldnames(T)) is hashed.
If StructType(T) <: StructTypes.UnorderedStruct (the default), the field values are first sorted by the lexographic order of the field names.
The type of a data type is hashed using string(nameof(T)), the fieldnames(T), (sorting them for UnorderedStruct), along with a hash of the type of each element of fieldtypes(T) according to their StructType.
No type parameters are hashed by default. To hash these you need to specialize on StableHashTraits.transform_type for your struct. Note that because fieldtypes(T) is hashed, you don't need to do this unless your type parameters are not used in the specification of your field types.
StructType.ArrayType
ArrayType is used when hashing a sequence of values.
To hash the value, each element of an array type is hashed using iterate. If the object isa AbstractArray, the size(x) of the object is also hashed.
If StableHashTraits.is_ordered returns false the elements are first sorted according to StableHashTraits.hash_sort_by.
To hash the type, the string "StructTypes.ArrayType" is hashed (meaning that the kind of array won't matter to the hash value), and the type of the elype is hashed, according to its StructType. If the type <: AbstractArray, the ndims(T) is hashed.
StructTypes.DictType
To hash the value, each key-value pair of a dict type is hashed, as returned by StructType.keyvaluepairs(x).
If StableHashTraits.is_ordered returns false (which is the default return value) the pairs are first sorted according their keys using StableHashTraits.hash_sort_by.
To hash the type, the string "StructTypes.DictType" is hashed (meaning that the kind of dictionary won't matter), and the type of the keytype and valtype is hashed, according to its StructType.
AbstractRange
AbstractRange constitutes an exception to the rule that we use StructType: for efficient hashing, ranges are treated as another first-class container object, separate from array types.
The value is hashed as (first(x), step(x), last(x)).
The type is hashed as "Base.AbstractRange" along with the type of the eltype, according to its StructType. Thus, the type of range doesn't matter (just that it is a range).
StructTypes{Number/String/Bool}Type
To hash the value, the result of Base.writeing the object is hashed.
To hash the type, the value of string("StructType.", nameof_string(StructType(T)))) is used (c.f. StableHashTraits.nameof_string for details). Note that this means the type of the value itself is not being hashed, rather a string related to its struct type.
StructType.CustomStruct
For any StructType.CustomStruct, the object is first StructType.lowered and the result is hashed according to the lowered StructType.
missing and nothing
There is no value hashed for missing or nothing; the type is hashed as the string "Base.Missing" and "Base.Nothing" respectively. Note in particular the string "Base.Missing" does not have the same hash as missing, since the former would have its struct type hashed.
StructType.{Null/Singleton}Type
Null and Singleton types are hashed solely according to their type (no value is hashed)
Their types is hashed by StableHashTraits.nameof_string
This means the module of the type does not matter: the module of a type is often considered an implementation detail, so it is left out to avoid unexpected hash changes from non-breaking releases that change the module of a type.
[!NOTE] If you wish to disambiguate functions or types that have the same name but that come from different modules you can overload
StableHashTraits.transform_typefor those functions. If you want to include the module name for a broad set of types, rather than explicitly specifying a module name for each type, you may want to consider callingStableHashTraits.module_nameof_stringin the body of yourtransform_typemethod