Ron
(dated, see the site) Replicated Object Notation, a distributed live data format, golang/ragel lib
Install / Use
/learn @gritzko/RonREADME
This code is ancient. See RDX instead.
The RON project was renamed RDX. See up-do-date codebases at https://github.com/gritzko/librdx (C) and https://github.com/drpcorg/chotki/rdx (Go)
Swarm Replicated Object Notation 2.0.1
Swarm Replicated Object Notation is a format for distributed live data. RON's focus is on continuous data synchronization. Every RON object may naturally have an unlimited number of replicas that synchronize incrementally, mostly in real-time. RON data always merges correctly and deterministically.
RON is information-centric: it aims to liberate the data from its location, storage, application or transport. There is no "master" replica, no "source of truth". Every event has an origin, but every replica is as good as the other one. Every single object, event or data type is uniquely identified and globally referenceable. RON metadata makes objects completely independent of the context. A program may read RON object versions and/or updates from the network, filesystem, database, message bus and/or local cache, in any order, and merge them correctly.
Consider JSON. It expresses relations by element positioning:
{
"foo": {
"bar": 1
}
}
RON may express that state as:
*lww #1TUAQ+gritzko @` :bar = 1;
#(R @` :foo > (Q;
Those are two RON ops:
- some last-write-wins object is created with a field
barset to1(on 2017-10-31 10:26:00 UTC, by gritzko), - another object is created with a field
foopointing to the first object (10:27:00, by gritzko).
Each op is a tuple of four globally-unique UUIDs for its data type, object, event and location, plus some number of value atoms. You may not see any UUIDs in the above example, initially. The notation does a lot to compress that metadata away.
These are the key features of RON:
- RON's basic unit is an immutable op. Every change to the data is an event; every event produces an op. An op may flow from a replica to a replica, from a database to a database, while fully intact and maintaining its original identity.
- Each RON op is context-independent. Nothing is implied by the context, everything is specified explicitly and unambiguously in the op itself. An op has four globally unique UUIDs for its data type, object, event and location.
- An object can be referenced by its UUID (e.g.
> 1TUAQ+gritzko), thus RON can express object graph structures beyond simple nesting. Overall, RON relates pieces of data by their UUIDs. Thanks to that, RON data can be cached locally, updated incrementally and edited while offline. - An object's state is a reduction of its ops. A data type is a reducer
function:
lww(state,change) = new_state. Reducers tolerate partial order of updates. Hence, all ops are applied immediately, without any linearization by a central server. - There is no sharp border between a state snapshot and a state update. State is change and change is state (state-change duality). A transactional unit of data storage/transmission is a frame. A frame can contain a single op, a complete object graph or anything inbetween: object state, stale state, patch, otherwise a piece of an object.
- RON model implies no special "source of truth". The event's origin is the
source of truth, not a server in the cloud. Every event/object is marked with
its origin (e.g.
gritzkoin1TUAQ+gritzko). - A RON frame is not a "message": it has an origin but it has no "destination". RON speaks in terms of data updates and subscriptions. Once you subscribe to an object, you receive the state and all the future updates, till you unsubscribe.
- RON is information-centric. Consider git: once you clone a repo, your copy is as good as the original one. Same with RON.
- RON is a hypermedia format, as data pieces can reference each other globally (imagine a RON-based real-time World-Wide-Web-of-Data). Although, both replica ids and data routing must work at global scale then (federated, etc).
- RON is not optimized for human consumption. It is a machine-to-machine language mostly. "Human" APIs are produced by mappers (see below).
- RON employs compression for its metadata. The RON UUID syntax is specifically fine-tuned for easy compression.
Consider the above frame uncompressed:
*lww #1TUAQ+gritzko @1TUAQ+gritzko :bar = 1;
*lww #1TUAR+gritzko @1TUAR+gritzko :foo > 1TUAQ+gritzko;
One may say, what metadata solves is [naming things and cache invalidation][2problems]. What RON solves is compressing that metadata.
RON makes no strong assumptions about consistency guarantees: linearized, causal-order or gossip environments are all fine (certain restrictions apply, see below). Once all the object's ops are propagated to all the object's replicas, replicas converge to the same state. RON formal model makes this process correct. RON wire format makes this process efficient.
Formal model
Swarm RON formal model has five key components:
-
An UUID is a globally unique 128-bit identifier. An UUID consists of two 60-bit parts: value and origin. 4+4 bits are reserved for flags. There are four UUID types:
- an event timestamp: logical/hybrid timestamp, e.g.
1TUAQ+gritzko, value is a monotonous counter1TUAQ, origin is a a replica idgritzko, roughly corresponds to RFC4122 v1 UUIDs, - a derived timestamp: same as event timestamp, but refers to some derived
calculation, not the original event (e.g.
1TUAQ-gritzko), - a name, either global or scoped to a replica, e.g.
foo,lww,bar(global),MyVariable$gritzko(scoped), - a hash (e.g.
4Js8lam4LB%kj529sMEsl, both parts are hash sum bits).
- an event timestamp: logical/hybrid timestamp, e.g.
-
An op is an immutable atomic unit of data change. An op is a tuple of four or more atoms. First four atoms of an op are UUIDs forming the op's key.
These UUIDs are:
- data type UUID, e.g.
lwwa last-write-wins object, - object UUID
1TUAQ+gritzko, - event UUID
1TUAQ+gritzkoand - location/reference UUID, e.g.
bar.
Other atoms (any number, any type) form the op's value. Op atoms types are:
- UUID,
- integer,
- string, or
- float.
Importantly, an op goes under one of four terms:
- raw ops (a single op, before being processed by a reducer),
- reduced ops (an op in a frame, processed by a reducer),
- frame headers (first op of a frame, planted by a reducer),
- queries (part of connection/subscription state machines).
- data type UUID, e.g.
-
A frame is an ordered collection of ops, a transactional unit of data
- an object's state is a frame
- a "patch" (aka "delta", "diff") is also a frame
- in general, data is seen as a [partially ordered][po] log of frames or chunks
- frame may contain any number of reduced chunks and raw ops in any order; a chunk consists of a header or a query header op followed by reduced ops belonging to the chunk; raw ops form their own one-op chunk.
-
A reducer is a RON term for a "data type"; reducers define how object state is changed by new ops
-
a reducer is a pure function:
f(state_frame, change_frame) -> new_state_frame, where frames are either empty frames or single ops or products of past reductions by the same reducer, -
reducers are:
- associative, e.g.
f( f(state, op1), op2 ) == f( state, patch )wherepatch == f(op1,op2) - commutative for concurrent ops (can tolerate causally consistent
partial orders), e.g.
f(f(state,a),b) == f(f(state,b),a), assumingaandboriginated concurrently at different replicas, - idempotent, e.g.
f(state, op1) == f(f(state, op1), op1) == f(state, f(op1, op1)), etc.
- associative, e.g.
-
optionally, reducers may have stronger guarantees, e.g. full commutativity (tolerates causality violations),
-
a frame could be an op, a patch or a complete state. Hence, a baseline reducer can "switch gears" from pure op-based CRDT mode to state-based CRDT to delta-based, e.g.
f(state, op)is op-basedf(state1, state2)is state-basedf(state, patch)is delta-based
-
-
a mapper translates a replicated object's state frame into other formats
- mappers turn RON objects into JSON or XML documents, C++, JavaScript or other objects
- mappers are one-way: RON metadata may be lost in conversion
- mappers can be pipelined, e.g. one can build a full RON->JSON->HTML [MVC][mvc] app using just mappers.
Single ops assume [causally consistent][causal] delivery. RON implies causal consistency by default. Although, nothing prevents it from running in a linearized [ACIDic][peterb] or gossip environment. That only relaxes (or restricts) the choice of reducers.
Wire format (text)
Design goals for the RON wire format is to be reasonably readable and reasonably compact. No less human-readable than regular expressions. No less compact than (say) three times plain JSON (and at least three times more compact than JSON with comparable amounts of metadata).
The syntax outline:
- atoms follow very predictable conventions:
- integers:
1 - e-notation floats:
3.1415,1.0e+6
- integers:
