Zebrapack
ZebraPack format is like gobs version 2: serialization in Go, *but* extremely fast and friendly to other languages. Use Go as your schema. Strong typing. Well documented (and msgpack2 compatible) format so other languages can be readily supported. See also https://github.com/glycerine/greenpack for a more recent alternative. Docs:
Install / Use
/learn @glycerine/ZebrapackREADME
ZebraPack: a data description language and serialization format. Like Gobs version 2.0.
ZebraPack is a data definition language and serialization format. It removes gray areas from msgpack2 serialized data, and provides for declared schemas, sane data evolution, and more compact encoding.
It does all this while maintaining the possibility of easy compatibility with all the dynamic languages that already have msgpack2 support. If your favorite language (after Go, of course) has a library that reads msgpack2, then it would be only a day's work to adapt the library to read zebrapack: the schema are in msgpack2, and then one simply keeps a hashmap to translate between small integer <-> field names/type.
Why start with msgpack2? Quite simple: msgpack2 is simple, fast, and extremely portable. It has an implementation in every language you've heard of, and some you haven't (some 50 libraries are available). It has a well defined and short spec. The mspack1 vs msgpack2 terminology is a distinction we make here: msgpack1 spec poorly distringuished between strings and raw binary bytes, but that was remedied in msgpack2. Importantly, msgpack2 is dynamic-language friendly because it is largely self-describing.
We find only two problems with msgpack2: weak support for data evolution, and insufficiently strong typing.
The ZebraPack format addresses these problems. Moreover, ZebraPack is actually still binary compatible with msgpack2 spec. It just adopts a new convention about how to encode the field names of structs. Structs are encoded in msgpack2 using maps, as usual. Hence all data is still encoded precisely in the msgpack2 format. The only difference in ZebraPack is this convention: maps that represent structs are now keyed by integers. Rather than have string keys -- the convention for most msgpack2 language bindings -- in ZebraPack we use integers as keys for those maps that are representing structs. These integers are associated with a field name and type in a (separable) schema. The schema is also defined and encoded in msgpack2.
The resulting binary encoding is very similar in style to protobufs/Thrift/Capn'Proto. However it is much more friendly to other (dynamic) languages. Also it is screaming fast (see benchmarks below).
Once we have a schema, we can be very strongly typed, and be very efficient. We borrow the idea of field deprecation from FlatBuffers. For conflicting update detection, we use CapnProto's field numbering discipline. We add support for the omitempty tag. In fact, in ZebraPack, all fields are omitempty. If they are empty they won't be serialized on the wire. Like FlatBuffers and Protobufs, this enables one to define a very large schema of possibilities, and then only transmit a very small (efficient) portion that is currently relevant over the wire.
Full credit: the code here descends from the fantastic msgpack2 code generator https://github.com/tinylib/msgp by Philip Hofer.
By default we generate ZebraPack format encoders and decoders when the zebrapack tool is run. Note that we continue to offer straight msgpack2 serialization and deserialization with the -msgp flag to zebrapack.
background and motivation
ZebraPack serialization. This one is all black and white. No gray areas.
ZebraPack is a data definition language and serialization format. It removes gray areas from msgpack2 serialized data, and provides for declared schemas, sane data evolution, and more compact encoding. It does all this while maintaining the possibility of easy compatibility with all the dynamic languages that already have msgpack2 support.
the main idea
//given this definition, defined in Go:
type A struct {
Name string `zid:"0"`
Bday time.Time `zid:"1"`
Phone string `zid:"2"`
Sibs int `zid:"3"`
GPA float64 `zid:"4" msg:",deprecated"` // a deprecated field.
Friend bool `zid:"5"`
}
then to serialize the following instance `a`, we would
print the schema information at the front of the file --
or detached completely and kept in a separate file --
in the form of a zebra.Schema (see https://github.com/glycerine/zebrapack/blob/master/zebra/zebra.go for the spec) structure. Then
the data follows as a map whose keys are now integers
instead of strings. A simple example:
original(msgpack2) -> schema(msgpack2) + each instance(msgpack2)
-------- -------------- -------------
a := A{ zebra.StructT{ map{
"Name": "Atlanta", 0: {"Name", String}, 0: "Atlanta",
"Bday": tm("1990-12-20"), 1: {"Bday", Timestamp}, 1: "1990-12-20",
"Phone": "650-555-1212", 2: {"Phone", String}, 2: "650-555-1212",
"Sibs": 3, 3: {"Sibs", Int64}, 3: 3,
"GPA" : 3.95, 4: {"GPA", Float64}, 4: 3.95,
"Friend":true, 5: {"Friend", Bool}, 5: true,
} } }
The central idea of ZebraPack: start with msgpack2, but when encoding a struct (in msgpack2 a struct is represented as a map), replace the key strings with small integers.
By having a small schema description (essentially a lookup table with int->string mappings and a type identifier) either separate or serialized at the front of the serialization stream/file, we get known schema types up-front, plus compression and the ability to evolve our data without crashes. If you've ever had your msgpack crash your server because you tried to change the type of a field but keep the same name, then you know how fragile msgpack can be.
By default, today, we serialize the schema to a separate file so that the wire encoding is as fast as possible. However it is trivial to add/pre-pend the encoded schema to any file when you need to. The zebrapack generated Go code incorporates knowledge of the schema, so if you are only working in Go there is no need to zebrapack -write-schema to generate an external schema desription file. In summary, by default we behave like protobufs/thrift/capnproto, but dynamic languages and runtime type discovery can be supported in full fidelity.
The second easy idea: use the Go language struct definition syntax as our serialization schema. Why invent another format? Serialization for Go developers should be almost trivially easy. While we are focused on a serialization format for Go, because other language can read msgpack2, they can also readily parse the schema. The schema is stored in msgpack2 struct convention (and optionally json), rather than the ZebraPack struct convention, for bootstrapping.
background
Starting point: msgpack2 is great. It is has an easy to read spec, it defines a compact serialization format, and it has wide language support from both dynamic and compiled languages.
Nonetheless, data update conflicts still happen and can be hard to resolve. Encoders could use the guidance of a schema to avoid signed versus unsigned integer encodings.
For instance, sadly the widely emulated C-encoder for msgpack chooses to encode signed positive integers as unsigned integers. This causes crashes in readers who were expected a signed integer, which they may have originated themselves in the original struct.
Astonishing, but true: the existing practice for msgpack2 language bindings allows the data types to change as they are read and re-serialized. Simple copying of a serialized struct can change the types of data from signed to unsigned. This is horrible. Now we have to guess whether an unsigned integer was really intended because of the integer's range, or if data will be silently truncated or lost when coercing a 64-bit integer to a 63-bit signed integer--assuming such coercing ever makes logical sense, which it may not.
This kind of tragedy happens because of a lack of shared communication across time and space between readers and writers. It is easily addressed with a shared schema. ZebraPack, in its essense, is the agreement to follow that schema when binding msgpack2 to a new language.
While not always necessary, a schema provides many benefits, both for coordinating between people and for machine performance.
-
Stronger typing: readers know what is expected, in both type and size of the data delivered. Writers know what they should be writing.
-
Performance and compression: replacing struct/map field names with numbers provides immediate space savings and compression.
-
Conflict resolution: the Cap'nProto numbering and update conflict resolution method is used here. This method originated in the ProtocolBuffers scheme, and was enhanced after experience in Cap'nProto. How it works: Additions are always made by incrementing by one the largest number available prior to the addition. No gaps in numbering are allowed, and no numbers are ever deleted. To get the effect of deletion, add the
deprecatedvalue inmsgtag. This is an effective tombstone. It allows the tools to help detect merge conflicts as soon as possible. If two people try to merge schemas where the same struct or field number is re-used, a schema compiler can automatically detect this update conflict, and flag the human to resolve the conflict before proceeding. -
All fields optional. Just as in msgpack2, Cap'nProto, Gobs, and Flatbuffers, all fields are optional. Most everyone, after experience and time with ProtocolBuffers, has come to the conclusion that required fields are a misfeature that hurt the ability to evolve data gracefully and maintain efficiency.
Design:
- Schema language: the schema language for defining structs is identical to the Go language. Go is expressive and yet easily parsed by the standard library packages included with Go itself. There are already high-performance msgpack2 libraries available for go, https://github.com/tinylib/msgp and https://github.com/ugorji/go which make s
Related Skills
openhue
344.4kControl Philips Hue lights and scenes via the OpenHue CLI.
sag
344.4kElevenLabs text-to-speech with mac-style say UX.
weather
344.4kGet current weather and forecasts via wttr.in or Open-Meteo
tweakcc
1.5kCustomize Claude Code's system prompts, create custom toolsets, input pattern highlighters, themes/thinking verbs/spinners, customize input box & user message styling, support AGENTS.md, unlock private/unreleased features, and much more. Supports both native/npm installs on all platforms.
