Ujson
µjson - A fast and minimal JSON parser and transformer that works on unstructured JSON
Install / Use
/learn @iOliverNguyen/UjsonREADME
µjson
👉 Checkout the new library for iterating unstructured JSON with Go1.23 iterator syntax 👈
ezpkg.io/iter.json · A Powerful and Efficient Way to Iterate and Manipulate Unstructured JSON in Go
<a href="https://pkg.go.dev/ezpkg.io/iter.json" target="_blank"> <img alt="godoc" src="https://pkg.go.dev/badge/ezpkg.io/iter.json" style="height: 28px"></a> <a href="https://github.com/ezpkg/iter.json" target="_blank"> <img alt="source" src="https://img.shields.io/badge/GitHub-100000?style=for-the-badge&logo=github&logoColor=white"> </a>
Read more about iter.json on the introduction article.
Example usage:
package main
import "ezpkg.io/iter.json"
func main() {
data := `{"id": 12345, "name": "foo", "numbers": ["one", "two"]}`
// 👇 iterate over the JSON with Go1.23 iterator syntax
for item, err := range iterjson.Parse([]byte(data)) {
if err != nil { panic(err) }
println(item.GetPathString(), item.Key, item.Token, item.Level)
}
}
µjson
(is now deprecated and replaced by iter.json)
A fast and minimal JSON parser and transformer that works on unstructured JSON. It works by parsing input and calling the given callback function when encountering each item.
Read more on the introduction article.
Motivation
Sometimes we just want to make some minimal changes to a JSON document, or do
some generic transformations without fully unmarshalling it. For example,
removing blacklist fields
from response JSON. Why spend all the cost on unmarshalling into a map[string]interface{}
just to immediate marshal it again.
The following code is taken from StackOverflow:
{
"responseHeader": {
"status": 0,
"QTime": 0,
"params": {
"q": "solo",
"wt": "json"
}
},
"response": {
"numFound": 2,
"start": 0,
"docs": [
{ "name": "foo" },
{ "name": "bar" }
]
}
}
With µjson, we can quickly write a simple transformation to remove "responseHeader" completely from all responses, once and forever:
func removeBlacklistFields(input []byte) []byte {
blacklistFields := [][]byte{
[]byte(`"responseHeader"`), // note the quotes
}
b := make([]byte, 0, 1024)
err := ujson.Walk(input, func(_ int, key, value []byte) bool {
if len(key) != 0 {
for _, blacklist := range blacklistFields {
if bytes.Equal(key, blacklist) {
// remove the key and value from the output
return false
}
}
}
// write to output
if len(b) != 0 && ujson.ShouldAddComma(value, b[len(b)-1]) {
b = append(b, ',')
}
if len(key) > 0 {
b = append(b, key...)
b = append(b, ':')
}
b = append(b, value...)
return true
})
if err != nil {
panic(err)
}
return b
}
The original scenario that leads me to write the package is because of int64. When working in Go and PostgreSQL, I use int64 (instead of string) for ids because it’s more effective and has enormous space for randomly generated ids. It’s not as big as UUID, 128 bits, but still big enough for production use. In PostgreSQL, those ids can be stored as bigint and being effectively indexed. But for JavaScript, it can only process integer up to 53 bits (JavaScript has BigInt but that’s a different story, and using it will make things even more complicated).
So we need to wrap those int64s into strings before sending them to JavaScript.
In Go and PostgreSQL, the JSON is {"order_id": 12345678} but JavaScript will
see it as {"order_id": "12345678"} (note that the value is quoted). In Go, we
can define a custom type and implement the
json.Marshaler interface. But
in PostgreSQL, that’s just not possible or too complicated. I wrote a service
that receives JSON from PostgreSQL and converts it to be consumable by
JavaScript. The service also removes some blacklisted keys or does some other
transformations (for example, change orderId to order_id).
Example use cases:
-
Walk through unstructured JSON:
- Print all keys and values
- Extract some values
-
Transform unstructured JSON:
without fully unmarshalling it into a map[string]interface{}.
See usage and examples on pkg.go.dev.
Important: Behaviour is undefined on invalid JSON. Use on trusted input
only. For untrusted input, you may want to run it through
json.Valid() first.
Usage
The single most important function is Walk(input, callback), which parses the
input JSON and call callback function for each key/value pair processed.
The callback function is called when an object key/value or an array key is
encountered. It receives 3 params in order: level, key and value.
-
levelis the indentation level of the JSON, if you format it properly. It starts from 0. It increases after entering an object or array and decreases after leaving. -
keyis the raw key of the current object or empty otherwise. It can be a double-quoted string or empty. -
valueis the raw value of the current item or a bracket. It can be a string, number, boolean, null, or one of the following brackets:{ } [ ].
It’s important to note that key and value are provided as raw. Strings are
always double-quoted. It’s there for keeping the library fast and ignoring
unnecessary operations. For example, when you only want to reformat the output
JSON properly; you don’t want to unquote those strings and then immediately
quote them again; you just need to output them unmodified. And there are
ujson.Unquote() and
ujson.AppendQuote()
when you need to get the original strings.
For valid JSON, values will never be empty. We can test the first byte of value
(value[0]) to get its type:
n: Null (null)f,t: Boolean (false,true)0-9,-: Number": String, seeUnquote()[,]: Array{,}: Object
When processing arrays and objects, first the open bracket ([, {) will be
provided as value, followed by its children, and finally the close bracket
(], }). When encountering open brackets, You can make the callback function
return false to skip the array/object entirely.
Examples
1. Print all keys and values in order
This example gives a quick idea about how µjson works.
input := []byte(`{
"id": 12345,
"name": "foo",
"numbers": ["one", "two"],
"tags": {"color": "red", "priority": "high"},
"active": true
}`)
ujson.Walk(input, func(level int, key, value []byte) bool {
fmt.Printf("%2v% 12s : %s\n", level, key, value)
return true
})
{
"id": 12345,
"name": "foo",
"numbers": ["one", "two"],
"tags": {"color": "red", "priority": "high"},
"active": true
}
Calling Walk() with the above input will produce:
| level | key | value |
|:-----:|:----------:|:-------:|
|0 | |{ |
|1 |"id" |12345 |
|1 |"name" |"foo" |
|1 |"numbers" |[ |
|2 | |"one" |
|2 | |"two" |
|1 | |] |
|1 |"tags" |{ |
|2 |"color" |"red" |
|2 |"priority"|"high" |
|1 | |} |
|1 |"active" |true |
|0 | |} |
0. The simplest examples
To easily get an idea on level, key and value, here are the simplest
examples:
input0 := []byte(`true`)
ujson.Walk(input0, func(level int, key, value []byte) bool {
fmt.Printf("level=%v key=%s value=%s\n", level, key, value)
return true
})
// output:
// level=0 key= value=true
input1 := []byte(`{ "key": 42 }`)
ujson.Walk(input1, func(level int, key, value []byte) bool {
fmt.Printf("level=%v key=%s value=%s\n", level, key, value)
return true
})
// output:
// level=0 key= value={
// level=1 key="key" value=42
// level=0 key= value=}
input2 := []byte(`[ true ]`)
ujson.Walk(input2, func(level int, key, value []byte) bool {
fmt.Printf("level=%v key=%s value=%s\n", level, key, value)
return true
})
// output:
// level=0 key= value=[
// level=1 key= value=true
// level=0
