Jsondiff
Compute the diff between two JSON documents as a series of JSON Patch (RFC6902) operations
Install / Use
/learn @wI2L/JsondiffREADME
Usage
First, get the latest version of the library using the following command:
$ go get github.com/wI2L/jsondiff@latest
[!IMPORTANT] Requires Go1.21+, due to the usage of the
hash/maphashpackage, and theany/min/maxkeyword/builtins.
Example use cases
Kubernetes Dynamic Admission Controller
The typical use case within an application is to compare two values of the same type that represents the source and desired target of a JSON document. A concrete application of that would be to generate the patch returned by a Kubernetes dynamic admission controller to mutate a resource. Thereby, instead of generating the operations, just copy the source in order to apply the required changes and delegate the patch generation to the library.
For example, given the following corev1.Pod value that represents a Kubernetes demo pod containing a single container:
import corev1 "k8s.io/api/core/v1"
pod := corev1.Pod{
Spec: corev1.PodSpec{
Containers: []corev1.Container{{
Name: "webserver",
Image: "nginx:latest",
VolumeMounts: []corev1.VolumeMount{{
Name: "shared-data",
MountPath: "/usr/share/nginx/html",
}},
}},
Volumes: []corev1.Volume{{
Name: "shared-data",
VolumeSource: corev1.VolumeSource{
EmptyDir: &corev1.EmptyDirVolumeSource{
Medium: corev1.StorageMediumMemory,
},
},
}},
},
}
The first step is to copy the original pod value. The corev1.Pod type defines a DeepCopy method, which is handy, but for other types, a shallow copy is discouraged, instead use a specific library, such as ulule/deepcopier. Alternatively, if you don't require to keep the original value, you can marshal it to JSON using json.Marshal to store a pre-encoded copy of the document, and mutate the value.
newPod := pod.DeepCopy()
// or
podBytes, err := json.Marshal(pod)
if err != nil {
// handle error
}
Secondly, make some changes to the pod spec. Here we modify the image and the storage medium used by the pod's volume shared-data.
// Update the image of the webserver container.
newPod.Spec.Containers[0].Image = "nginx:1.19.5-alpine"
// Switch storage medium from memory to default.
newPod.Spec.Volumes[0].EmptyDir.Medium = corev1.StorageMediumDefault
Finally, generate the patch that represents the changes relative to the original value. Note that when the Compare function is called, the source and target parameters are first marshaled using the encoding/json package (or a custom func) in order to obtain their final JSON representation, prior to comparing them.
import "github.com/wI2L/jsondiff"
patch, err := jsondiff.Compare(pod, newPod)
if err != nil {
// handle error
}
b, err := json.MarshalIndent(patch, "", " ")
if err != nil {
// handle error
}
os.Stdout.Write(b)
The output is similar to the following:
[{
"op": "replace",
"path": "/spec/containers/0/image",
"value": "nginx:1.19.5-alpine"
}, {
"op": "remove",
"path": "/spec/volumes/0/emptyDir/medium"
}]
The JSON patch can then be used in the response payload of you Kubernetes webhook.
Optional fields gotcha
Note that the above example is used for simplicity, but in a real-world admission controller, you should create the diff from the raw bytes of the AdmissionReview.AdmissionRequest.Object.Raw field. As pointed out by user /u/terinjokes on Reddit, due to the nature of Go structs, the "hydrated" corev1.Pod object may contain "optional fields", resulting in a patch that state added/changed values that the Kubernetes API server doesn't know about. Below is a quote of the original comment:
Optional fields being ones that are a struct type, but are not pointers to those structs. These will exist when you unmarshal from JSON, because of how Go structs work, but are not in the original JSON. Comparing between the unmarshaled and copied versions can generate add and change patches below a path not in the original JSON, and the API server will reject your patch.
A realistic usage would be similar to the following snippet:
podBytes, err := json.Marshal(pod)
if err != nil {
// handle error
}
// req is a k8s.io/api/admission/v1.AdmissionRequest object
jsondiff.CompareJSON(req.AdmissionRequest.Object.Raw, podBytes)
Mutating the original pod object or a copy is up to you, as long as you use the raw bytes of the AdmissionReview object to generate the patch.
You can find a detailed description of that problem and its resolution in this GitHub issue.
Outdated package version
There's also one other downside to the above example. If your webhook does not have the latest version of the client-go package, or whatever package that contains the types for the resource you're manipulating, all fields not known in that version will be deleted.
For example, if your webhook mutate Service resources, a user could set the field .spec.allocateLoadBalancerNodePort in Kubernetes 1.20 to disable allocating a node port for services with Type=LoadBalancer. However, if the webhook is still using the v1.19.x version of the k8s.io/api/core/v1 package that define the Service type, instead of simply ignoring this field, a remove operation will be generated for it.
Options
If more control over the diff behavior is required, you can pass a variadic list of functional options as the third argument of the Compare and CompareJSON functions.
Note that any combination of options can be used without issues, unless specified.
Table of contents
- Factorization
- Rationalization
- Invertible patch
- Equivalence
- LCS (array comparison)
- Ignores
- Marshal/Unmarshal functions
Operations factorization
By default, when computing the difference between two JSON documents, the package does not produce move or copy operations. To enable the factorization of value removals and additions as moves and copies, you should use the functional option Factorize(). Factorization reduces the number of operations generated, which inevitably reduce the size of the patch once it is marshaled as JSON.
For instance, given the following document:
{
"a": [ 1, 2, 3 ],
"b": { "foo": "bar" }
}
In order to obtain this updated version:
{
"a": [ 1, 2, 3 ],
"c": [ 1, 2, 3 ],
"d": { "foo": "bar" }
}
The package generates the following patch:
[
{ "op": "remove", "path": "/b" },
{ "op": "add", "path": "/c", "value": [ 1, 2, 3 ] },
{ "op": "add", "path": "/d", "value": { "foo": "bar" } }
]
If we take the previous example and generate the patch with factorization enabled, we then get a different patch, containing copy and move operations instead:
[
{ "op": "copy", "from": "/a", "path": "/c" },
{ "op": "move", "from": "/b", "path": "/d" }
]
Operations rationalization
The default method used to compare two JSON documents is a recursive comparison. This produce one or more operations for each difference found. On the other hand, in certain situations, it might be beneficial to replace a set of operations representing several changes inside a JSON node by a single replace operation targeting the parent node, in order to reduce the "size" of the patch (the length in bytes of the JSON representation of the patch).
For that purpose, you can use the Rationalize() option. It uses a simple weight function to decide which patch is best (it marshals both sets of operations to JSON and looks at the length of bytes to keep th
