Bilrost

A compact, extensible, ergonomic binary encoding for the Rust language

Generate Convert Improve

Install / Use

/learn @mumbleskates/Bilrost

About this skill

Quality Score

0/100

README

BILROST!

Bilrost is an encoding format designed for storing and transmitting structured data, such as in file formats or network protocols. The encoding is binary, and unsuitable for reading directly by humans; however, it does have other other useful properties and advantages. This crate, bilrost, is its first implementation and its first instantiation.

Bilrost is designed with the following goals in mind:

A stable encoding format, simple to specify and relatively easy to implement even in other languages
Durable encoded data, suitable to retain across many versions of the application that generated it or to transmit between applications that have very different versions[^extensions]
Good performance, comparable to what is achievable in encodings with similar design
Canonical encoding and distinguished decoding
Unintrusive: implementations should be able to efficiently implement encoding and decoding on structs that are already in use in the program. At worst they should be extremely similar to the structs in use and easy to populate & move around the program, rather than forcing users to use structs code-generated by a tool that come with cumbersome access & modification APIs[^derive-codegen]

[^derive-codegen]: The bilrost Rust library implements encoding and decoding for message types via #[derive] macros. Technically this is generated code, but it makes use of the existing compiler infrastructure rather than tooling, the resulting code never needs to be added to version control, and the definition of the type itself is always unaffected.

[^extensions]: Bilrost's design, like protobuf's, is oriented towards versioning by introducing new fields to the encoding (and possibly deprecating old ones) in a way that can still be mutually intelligible by both the old and new versions of the application.

Non-goals include[^lol]:

A self-describing format
The most compact or compressible format[^octet-aligned]
The fastest format[^memcpy-fast]

[^lol]: Also a non-goal, not listed here: a small readme :)

[^octet-aligned]: Bilrost is octet-aligned and does not try to save bytes by stuffing or commingling data between field keys and their values, a practice which can save space but increases complexity and makes distinguished decoding harder and more prone to mistakes in implementation.

[^memcpy-fast]: Many of the decisions made in Bilrost in order to achieve stable representation across versions of an evolving schema, extensibility, and general simplicity sacrifice opportunities for extreme performance. These are deliberate tradeoffs that often preclude the ability to perform fast & branchless encoding similar to what is seen in some other encodings, which are often more similar to directly copying the memory of a struct than to distinctly encoding the value of each field. In exchange schemas are simpler to describe and more portable and the encoded data is more durable.

Bilrost at the encoding level is based upon Protocol Buffers (protobuf) and shares many of its traits, but is incompatible. It is in some ways simpler and less rigid in its specification, and is designed to improve on some of protobuf's deficiencies. In doing so it breaks wire-compatibility with protobuf.

Bilrost (as a specification) strives to provide a superset of the capabilities of protocol buffers while reducing some of the surface area for mistakes and surprises; bilrost (the implementing library) strives to provide access to all of those capabilities with maximum convenience.

bilrost is implemented for the Rust Language. It is a direct fork of prost, and shares many of its performance characteristics. (It is not the fastest possible encoding library, but it is still pretty fast and comes with unique advantages.) Like prost, bilrost can enable writing simple, idiomatic Rust code with derive macros that serialize and deserialize structs from binary data. Unlike prost, bilrost is free from most of the constraints of the protobuf ecosystem and required semantics of protobuf message types. Bilrost (the specification) and this library allow much wider compatibility with existing struct types and their normal semantics. Rather than relying on producing generated code from a protobuf .proto schema definition, bilrost is designed to be easily used "by hand," as a pure enhancement to types the user would already have written rather than as a system that railroads the user into using opinionated and specialized struct types designed only for encoding and decoding.

🌈

Quick start
Differences from prost
Differences from Protobuf
- Distinguished representation of data and how this is achieved
Compared to other encodings, distinguished and not
Why use Bilrost?
Why not use Bilrost?
How does it work?
- How exactly does it work?
License & copyright

This readme is the result of a lot of work, and we want it to be good! If anything is unclear or could be improved, please feel free to submit issues or pull requests!

Conceptual overview

Bilrost is an encoding scheme for converting in-memory data structs into plain byte strings and vice versa. It's generally suitable for both network transport and data retained over the long-term. Its encoded data is not human-readable, but it is encoded quite simply. It supports integral and floating point numbers, strings and byte strings, nested messages, and recursively nested messages. All of the above are supported as optional values, repeated values, sets of unique values, and key/value mappings where sensible. With appropriate choices of encodings (which determine the representation), most of these constructs can be nested almost arbitrarily.

Encoded Bilrost data does not include the names of its fields; they are instead assigned numbers agreed upon in advance by the message schema that specifies it. This can make the data much more compact than "schemaless" encodings like JSON, CBOR, etc., without sacrificing its extensibility: new fields can be added, and old fields removed, without necessarily breaking backwards compatibility with older versions of the encoding program. In the typical "relaxed" decoding mode, any field not in the message schema is ignored when decoding, so if fields are added or removed over time the fields that remain in common will still be mutually intelligible between the two versions of the schema. In this way, Bilrost is very similar to protobuf. See also: Design philosophy, Comparisons to other encodings, and the Encoding specification.

Bilrost also has the ability to encode and decode data that is guaranteed to be canonically represented: see the section on distinguished decoding.

Design philosophy

Bilrost is designed to be an encoding format that is simple to specify, simple to implement, simple to port across languages and machines, and easy to use correctly.

Schema-ful encoding

It is designed as a data model that has a schema, though it can of course also be used to encode representations of "schemaless" data. There are advantages and disadvantages to this form. The encoded data is significantly smaller, since repetitive names of fields are replaced with surrogate numbers. At the same time, it may be less clear what the data means because the inherent documentation of the fields' names is missing. Schemaless encodings like JSON can be decoded and accessed dynamically as pure data with far simpler, unified decoder implementations, whereas encodings like Bilrost and protobuf require a schema to even be sure of the values.

One argument is that even if fields' names are all specified in the encoding, they are merely low-information documentation that aids guessing or reverse-engineering. They can help diagnose where lost data belongs, or what mystery data means by lightly self-documenting, but the meaning of the data is still determined by the code that emitted it. Data has meaning based on where it is found, and the documentation of that meaning cannot be fully replaced by simply inc

Related Skills

node-connect

349.9k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

109.8k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

349.9k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

qqbot-media

349.9k

QQBot 富媒体收发能力。使用 <qqmedia> 标签，系统根据文件扩展名自动识别类型（图片/语音/视频/文件）。