SkillAgentSearch skills...

Tmframe

TMFRAME, pronounced "time frame", is a binary standard for compactly encoding time series data

Install / Use

/learn @glycerine/Tmframe
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

TMFRAME

TMFRAME, pronounced "time frame", is a simple and efficient binary standard for encoding time series data.

Starting with a 64-bit nanoseconds-since the Unix epoch timestamp, the idea here is that the low 3-bits are really just random noise, given that our clock calibrations just aren't that accurate.

So we replace those 3 bits with a useful data payload to get a highly compressed timeseries format.

specification

The TMFRAME format allows very compact expression of time-series. For example, for a simple time-series, the TMFRAME encoding can be as simple as a sequence of 64-bit timestamps (whose resolution is limited to 10 nanoseconds). However the same format can be accompanied by much longer additional event data if need be. Common situations where a single float64 are needed for the timepoint's value are supported with exactly two words (two 64-bit words; one for the timestamp and one for the float64 payload).

overview of the format

A TMFRAME message always starts with a primary word.

Depending on the content of the low 3 bits of the primary word, the primary word may be the entire message. However, there may also be additional words and bytes following the primary word that complete the message.

TMFRAME messages can be classified as being either be 8 bytes (primary word only), 16 bytes long, greater than 16 bytes long.

Frequently a TMFRAME message will consist of one primary word, one UDE word, and a variable length payload.

The primary word and UDE word are always 64-bit words each. The payload can be up to 2^43 bytes in length.

We illustrate the possible TMFRAME message lengths here:

a) primary word only

+---------------------------------------------------------------+
|      primary word (64-bits) with PTI={0, 4, 5, or 6}          |
+---------------------------------------------------------------+

b) primary word and UDE word only:

+---------------------------------------------------------------+
|                primary word (64-bits) with PTI=7              |
+---------------------------------------------------------------+
|            User-defined-encoding (UDE) descriptor             |
+---------------------------------------------------------------+

c) primary word + UDE word + variable byte-length message:

+---------------------------------------------------------------+
|                primary word (64-bits) with PTI=7              |
+---------------------------------------------------------------+
|            User-defined-encoding (UDE) descriptor             |
+---------------------------------------------------------------+
|               variable length                                 |
|                message here                          ----------
|     (the UDE supplies the exact byte-count)          |
+-------------------------------------------------------

There are also three special payload types that are not UDE based. They handle the common need to attach one or two 64-bit values to a timestamp.

d) primary word + one int64

+---------------------------------------------------------------+
|                primary word (64-bits) with PTI=1              |
+---------------------------------------------------------------+
|                     V1 (int64)                                |
+---------------------------------------------------------------+

e) primary word + one float64

+---------------------------------------------------------------+
|                primary word (64-bits) with PTI=2              |
+---------------------------------------------------------------+
|                     V0 (float64)                              |
+---------------------------------------------------------------+

f) primary word + one float64 + one int64

+---------------------------------------------------------------+
|                primary word (64-bits) with PTI=3              |
+---------------------------------------------------------------+
|                     V0 (float64)                              |
+---------------------------------------------------------------+
|                     V1 (int64)                                |
+---------------------------------------------------------------+

1. number encoding rules

Integers and floating point numbers are used in the protocol that follows, so we fix our definitions of these.

  • Integers: are encoded in little-endian format. Signed integers use two’s complement. Integers are signed unless otherwise noted.
  • float64, also known as 64-bit floating-point numbers: Encoded in little-endian IEEE-754 format.

2. primary word encoding

A TMFRAME message always starts with a primary word.


msb                  primary word (64-bits)                   lsb
+-----------------------------------------------------------+---+
|                        TMSTAMP                            |PTI|
+-----------------------------------------------------------+---+

TMSTAMP (61 bits) =
     The primary word is generated by starting
     with a 64-bit signed little endian integer, the number
     of nanoseconds since the unix epoch; then truncating off
     the lowest 3-bits and overwriting them with the value of PTI.
     The resulting TMSTAMP value is the 61 most significant
     bits of the timestamp and can be used directly as an
     integer timestamp by first copying the full 64-bits of the
     timeframe word and then zero-ing out the 3 bits of PTI.
     
PTI (3 bits) = Payload type indicator, decoded as follows:

    0 => a zero value is indicated for this timestamp.
         (the zero value can also be encoded, albeit
         less efficiently, by a UDE word with bits all 0).
         
         Use the zero-value for time-stamp only time-series.

         The primary word is the only word in this message.
         The next word will be the primary word of the next
         message on the wire.

         By convention, the 0 value can indicate the
         payload false for boolean series.

    1 => exactly one 64-bit int64 payload value follows.
         The message has exactly two 64-bit words.
         The payload is known as V1.

    2 => exactly one 64-bit float64 payload value follows.
         Nmemonic: The total number of 64-bit words in the message is 2.
         The payload is known as V0.

    3 => exactly two 64-bit payload values follow, one float64 and one int64.
         Nmemonic: The total number of 64-bit words in the message is 3.
         The payload components are known as V0 (the float64), and
         V1 (the int64).

    4 => NULL: the null-value, a known and intentionally null value. Written as NULL.

         NB By convention, for a strictly boolean series, PTI=4 is the true value,
         while PTI=0 is the false value.

         The primary word is the only word in this message.

    5 => NA: not-available, an unintentionally missing value.
         In statistics, this indicates that *any* value could
         have been the correct payload here, but that the
         observation was not recorded. a.k.a. "Missing data". Written as NA.

         The primary word is the only word in this message.

    6 => NaN: not-a-number, IEEE-754 floating point NaN value.
         Obtained when dividing zero by zero, for example. math.IsNaN()
         detects these.

         The primary word is the only word in this message.

    7 => user-defined-encoding (UDE) descriptor word follows.

3. User-defined-encoding descriptor

msb    user-defined-encoding (UDE) descriptor 64-bit word     lsb
+---------------------------------------------------------------+
| EVTNUM (21-bits)  |                UCOUNT (43-bits)           |
+---------------------------------------------------------------+

  UCOUNT => is a 43-bit unsigned integer number of bytes that
       follow as a part of this message. Zero is allowed as a
       value in UCOUNT, and is useful when the type information in EVTNUM
       suffices to convey the event. Mask off the high 21-bits
       of the UDE to erase the EVTNUM before using the count
       of bytes found in UCOUNT. The payload starts immediately
       after the UDE word, and can be up to 8TB long (2^43 bytes).
       Shorter payloads are recommended whenever possible.

       There is no requirement that UCOUNT be padded to
       any alignment boundary. It should be the exact length
       of the payload in bytes.

       The next message's primary word will commence after the
       UCOUNT bytes that follow the UDE.

       If UCOUNT is > 0, then the payload of bytes must
       include a 0 byte as its last value. This assists
       in languages bindings (e.g. C) where strings need a
       terminating zero byte.

  EVTNUM => a 21-bit signed two's-compliment integer capable
       of expressing values in the range [-(2^20), (2^20)-1].

       Positive numbers are for pre-defined system event
       types. Negative numbers are reserved for user-defined
       event types starting with -2, -3, -4, ...

       There is one pre-defined user-defined event number.
       The one pre-defined user EVTNUM value is:

       -1 => an error message string in utf8 follows; it is
             of length UCOUNT, and the count includes a
             zero termination byte if and only if the string has
             one or more bytes in it.

       Any custom user-defined types added by the user will
       therefore start at EVTNUM = -2. The last usable EVTNUM is
       the -1 * (2^20) value; so over one million user
       defined event types are available.

       System defined EVTNUM values as of this writing are:

       0 => this is also a zero value payload. The corresponding
            UCOUNT must also be 0. There are no other words
            in this message. This allows encoders to not
            have to go back and compress out a zero value by
            writing a PTI of zero; alth

Related Skills

View on GitHub
GitHub Stars27
CategoryDevelopment
Updated1y ago
Forks3

Languages

Go

Security Score

75/100

Audited on Nov 28, 2024

No findings