TCOBS v1 & v2

<details> <summary>Table of Contents</summary> <ol>

TCOBS v1 & v2

<div id="top"></div></ol></details>

GitHub All Releases GitHub code size in bytes GitHub watchers GitHub issues

1. <a name='AboutTheproject'></a>About The project

./docs/ref/COBSDataDisruption.svg

TCOBS is a variant of COBS combined with real-time RLE data compression especially for short messages containing integers.
The maximum overhead with TCOBS (v1 or v2) is 1 byte for each starting 31 bytes in the worst case, when no compression is possible. This results from the 5 chaining bits in the needed NOP-sigil bytes 🖇 for uncompressable data: For an input buffer size iz is the maximum needed output buffer size oz = iz * 32/31 + 1. (Example: A 1000 bytes buffer can be encoded with max 33 additional bytes.) This is more compared to the original COBS with +1 byte for each starting 254 bytes, but if the data contain integer numbers, as communication packets often do, the encoded data will be statistically shorter with TCOBS compared to the legacy COBS.

1.1. <a name='Assumptions'></a>Assumptions

Most messages like Trices consist of 16 or less bytes.
Some messages or user data are longer.
Several zeros in a row are a common pattern (example:00 00 00 05).
Several 0xFF in a row are a common pattern too (example -1 as 32 bit value).
Maybe some other bytes appear also in a row.
TCOBS does not know the inner data structure and is therefore usable on any user data.

2. <a name='Preface'></a> Preface

TCOBS was originally developed as an optional Trice part and that's the T is standing for. It aims to reduce the binary trice data together with framing in one step.
- T symbols also the joining of the 2 orthogonal tasks compression and framing.
- Additionally, the usage of ternary and quaternary numbers in TCOBSv2 is reflected in the letter T.
TCOBSv2 is a better approach for TCOBSv1, suited perfect when long sequences of equal characters occur in the data stream.
- The TCOBSv1 compression is expected to be not that good as with TCOBSv2.
About the data is assumed, that 00-bytes and FF-bytes occur a bit more often than other bytes.
The compression aim is more to get a reasonable data reduction with minimal computing effort, than reducing to an absolute minimum. The method shown here simply counts repeated bytes and transforms them into shorter sequences. It works well also on very short messages, like 2 or 4 bytes and on very long buffers. The compressed buffer contains no 00-bytes anymore what is the aim of COBS.
TCOBS is stand-alone usable in any project for package framing with data minimizing.
Use cases in mind are speed, limited bandwidth and long time data recording in the field.
TCOBS is inspired by rlercobs. The ending sigil byte idea comes from rCOBS. It allows a straight forward encoding avoiding lookahead and makes this way the embedded device code simpler.
TCOBS uses various chained sigil bytes to achieve an additional lossless compression if possible.
Each encoded package ends with a sigil byte.
0 is usable as delimiter byte between the packages containing no 0 anymore. It is up to the user to insert the optional delimiters for framing after each or several packages.

2.1. <a name='Whynotin2steps'></a> Why not in 2 steps?

Usually it is better to divide this task and do compression and COBS encoding separately. This is good if size and time do not really matter.
Each single transformation adds a separate control byte, so a combined transformation adds just 1 byte instead of 2.
The for TCOBS expected messages are typically in the range of 2 to 300 bytes, but not limited, and a run-length encoding then makes sense for real-time compression.
Separating compression and COBS costs more time (2 processing loops) and does not allow to squeeze out the last byte.
With the TCOBS algorithm, in only one processing loop a smaller transfer packet size is expected, combined with more speed.

3. <a name='DataDisruptionHandling'></a>Data Disruption Handling

In case of data disruption, the receiver will wait for the next 0-delimiter byte. As a result it will get a packet start and end of 2 different packages A and Z.
<a href="https://github.com/rokath/tcobs"> <img src="./docs/ref/COBSDataDisruption.svg" alt="Logo" width="1200" height="120"> </a>
For the decoder it makes no difference if the packages starts or ends with a sigil byte. In any case it will run into issues in such case with high probability and report a data disruption. But a false match is not excluded for 100%.
- If the decoded data are structured, one can estimate the false match probability and increase the safety with an additional package CRC before encoding, if needed.
The receiver calls continuously a Read() function. The received buffer can contain 0-delimited packages and the receiver assumes them all to be valid because there is no known significant time delay between package start and end.
If a package start was received and the next package end reception is more than ~100ms away, a data disruption is likely and the receiver should ignore these data.
- Specify a maximum inter-byte delay inside a single package like ~50ms for example.
- To minimize the loss in case of data disruption, each message should get TCOBS encoded and 0-byte delimited separately.
- The more often 0-byte delimiters are increasing the transmit overhead a bit on the other hand.
Of course, when the receiver starts, the first buffer can contain broken TCOBS data, but we have to live with that on a PC. Anyway there is a reasonable likelihood that a data inconsistency is detected as explained.

4. <a name='CurrentState'></a>Current State

[x] The TCOBSv1 & TCOBSv2 code is stable and ready to use without limitations.

| Property | TCOBSv1 | TCOBSv2 | |--------------------------------------------------------|-----------------|-----------------| | Code amount | 🟢 less | 🟡 more | | Speed assumption (not measured yet) | 🟢 faster | 🟢 fast | | Compression on short messages from 2 bytes length | 🟢 yes | 🟢 yes | | Compression on messages with many equal bytes in a row | 🟡 good | 🟢 better | | Encoding C language support | 🟢 yes | 🟢 yes | | Decoding C language support | 🟢 yes

Tcobs

Install / Use

README