Bilrost
A compact, extensible, ergonomic binary encoding for the Rust language
Install / Use
/learn @mumbleskates/BilrostREADME
BILROST!
Bilrost is an encoding format designed for storing and transmitting structured
data, such as in file formats or network protocols. The encoding is binary, and
unsuitable for reading directly by humans; however, it does have other other
useful properties and advantages. This crate, bilrost, is its first
implementation and its first instantiation.
Bilrost is designed with the following goals in mind:
- A stable encoding format, simple to specify and relatively easy to implement even in other languages
- Durable encoded data, suitable to retain across many versions of the application that generated it or to transmit between applications that have very different versions[^extensions]
- Good performance, comparable to what is achievable in encodings with similar design
- Canonical encoding and distinguished decoding
- Unintrusive: implementations should be able to efficiently implement encoding and decoding on structs that are already in use in the program. At worst they should be extremely similar to the structs in use and easy to populate & move around the program, rather than forcing users to use structs code-generated by a tool that come with cumbersome access & modification APIs[^derive-codegen]
[^derive-codegen]: The bilrost Rust library implements encoding and decoding
for message types via #[derive] macros. Technically this is generated code,
but it makes use of the existing compiler infrastructure rather than tooling,
the resulting code never needs to be added to version control, and the
definition of the type itself is always unaffected.
[^extensions]: Bilrost's design, like protobuf's, is oriented towards versioning by introducing new fields to the encoding (and possibly deprecating old ones) in a way that can still be mutually intelligible by both the old and new versions of the application.
Non-goals include[^lol]:
- A self-describing format
- The most compact or compressible format[^octet-aligned]
- The fastest format[^memcpy-fast]
[^lol]: Also a non-goal, not listed here: a small readme :)
[^octet-aligned]: Bilrost is octet-aligned and does not try to save bytes by stuffing or commingling data between field keys and their values, a practice which can save space but increases complexity and makes distinguished decoding harder and more prone to mistakes in implementation.
[^memcpy-fast]: Many of the decisions made in Bilrost in order to achieve stable representation across versions of an evolving schema, extensibility, and general simplicity sacrifice opportunities for extreme performance. These are deliberate tradeoffs that often preclude the ability to perform fast & branchless encoding similar to what is seen in some other encodings, which are often more similar to directly copying the memory of a struct than to distinctly encoding the value of each field. In exchange schemas are simpler to describe and more portable and the encoded data is more durable.
Bilrost at the encoding level is based upon Protocol Buffers (protobuf) and shares many of its traits, but is incompatible. It is in some ways simpler and less rigid in its specification, and is designed to improve on some of protobuf's deficiencies. In doing so it breaks wire-compatibility with protobuf.
Bilrost (as a specification) strives to provide a superset of the capabilities
of protocol buffers while reducing some of the surface area for mistakes and
surprises; bilrost (the implementing library) strives to provide access to
all of those capabilities with maximum convenience.
bilrost is implemented for the Rust Language. It is a direct fork of
prost, and shares many of its performance characteristics. (It is not the
fastest possible encoding library, but it is still pretty fast and comes with
unique advantages.) Like prost, bilrost can enable writing simple, idiomatic
Rust code with derive macros that serialize and deserialize structs from
binary data. Unlike prost, bilrost is free from most of the constraints of
the protobuf ecosystem and required semantics of protobuf message types.
Bilrost (the specification) and this library allow much wider compatibility with
existing struct types and their normal semantics. Rather than relying on
producing generated code from a protobuf .proto schema definition, bilrost
is designed to be easily used "by hand," as a pure enhancement to types the user
would already have written rather than as a system that railroads the user into
using opinionated and specialized struct types designed only for encoding and
decoding.
🌈
Contents
- Quick start
- Differences from
prost - Differences from Protobuf
- Compared to other encodings, distinguished and not
- Why use Bilrost?
- Why not use Bilrost?
- How does it work?
- License & copyright
This readme is the result of a lot of work, and we want it to be good! If anything is unclear or could be improved, please feel free to submit issues or pull requests!
Conceptual overview
Bilrost is an encoding scheme for converting in-memory data structs into plain byte strings and vice versa. It's generally suitable for both network transport and data retained over the long-term. Its encoded data is not human-readable, but it is encoded quite simply. It supports integral and floating point numbers, strings and byte strings, nested messages, and recursively nested messages. All of the above are supported as optional values, repeated values, sets of unique values, and key/value mappings where sensible. With appropriate choices of encodings (which determine the representation), most of these constructs can be nested almost arbitrarily.
Encoded Bilrost data does not include the names of its fields; they are instead assigned numbers agreed upon in advance by the message schema that specifies it. This can make the data much more compact than "schemaless" encodings like JSON, CBOR, etc., without sacrificing its extensibility: new fields can be added, and old fields removed, without necessarily breaking backwards compatibility with older versions of the encoding program. In the typical "relaxed" decoding mode, any field not in the message schema is ignored when decoding, so if fields are added or removed over time the fields that remain in common will still be mutually intelligible between the two versions of the schema. In this way, Bilrost is very similar to protobuf. See also: Design philosophy, Comparisons to other encodings, and the Encoding specification.
Bilrost also has the ability to encode and decode data that is guaranteed to be canonically represented: see the section on distinguished decoding.
Design philosophy
Bilrost is designed to be an encoding format that is simple to specify, simple to implement, simple to port across languages and machines, and easy to use correctly.
Schema-ful encoding
It is designed as a data model that has a schema, though it can of course also be used to encode representations of "schemaless" data. There are advantages and disadvantages to this form. The encoded data is significantly smaller, since repetitive names of fields are replaced with surrogate numbers. At the same time, it may be less clear what the data means because the inherent documentation of the fields' names is missing. Schemaless encodings like JSON can be decoded and accessed dynamically as pure data with far simpler, unified decoder implementations, whereas encodings like Bilrost and protobuf require a schema to even be sure of the values.
One argument is that even if fields' names are all specified in the encoding, they are merely low-information documentation that aids guessing or reverse-engineering. They can help diagnose where lost data belongs, or what mystery data means by lightly self-documenting, but the meaning of the data is still determined by the code that emitted it. Data has meaning based on where it is found, and the documentation of that meaning cannot be fully replaced by simply inc
Related Skills
node-connect
349.9kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
109.8kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
349.9kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
349.9kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
