SkillAgentSearch skills...

Bytesutil

Generic binary protocol codec library for Java

Install / Use

/learn @zhtmf/Bytesutil
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

License Coverage Status Build Status Maven Central

Bytes-Util

What's this

A Java library that enables easing parsing of binary encoding schemes.

Motivation

Nowadays there are still a lot of systems which communicates by non-standard home-grown binary protocols. Implementing such protocols is always a pain as I feel during my daily work, due to data structures or types not natively supported or hard to implement in Java, like unsigned integral types, little-endian numbers or binary coded decimals, just to name a few. So I make this library to ease the process of parsing and make programmers focus more on their real work.

Quick Start

Consider the following definition of a data packet:

|Field Name |Length|Type
|----------------|------|----- |Header Mark |1|1-byte unsigned integer |Packet Type |1|1-byte unsigned integer |Sequence Number |2|2-byte unsigned integer |End Mark |1|1-byte unsigned integer

And declaration of the following Java class:

import io.github.zhtmf.DataPacket;
import io.github.zhtmf.annotations.modifiers.Order;
import io.github.zhtmf.annotations.modifiers.Unsigned;
import io.github.zhtmf.annotations.types.BYTE;
import io.github.zhtmf.annotations.types.SHORT;

@Unsigned
public class MyPacket extends DataPacket{
    @BYTE
    @Order(0)
    public int headerMark;
    @BYTE
    @Order(1)
    public int type;
    @SHORT
    @Order(2)
    public int sequenceNumber;
    @BYTE
    @Order(3)
    public byte endMark;
}

And coding serialization/deserialization become as simple:

InputStream in = ...; //socket input stream, or other input streams.
MyPacket packet = new MyPacket();
packet.deserialize(in); //fields will be filled with values in the stream.
OutputStream out = ...; //write to socket or other forms of output streams.
MyPacket packet = new MyPacket();
packet.headerMark = 0xEB;
packet.type = 0x01;
packet.sequenceNumber = 255;
packet.endMark = 0x07;
packet.serialize(out);

As you can see from the basic example above, to use this library users need to:

  1. Define a class which subclasses DataPacket
  2. Use Order as well as other annotations to define properties of instance fields which corresponds directly with fields in original binary schemes.
  3. Use inheritied methods like DataPacket#serialize and DataPacket#deserialize to handle serialization/deserialization.

Basic Concepts

Abstract Data Types

This library defines the following pseudo data types which serves as an abstraction over various definitions in binary encoding schemes, and their annotations can be found under the package io.github.zhtmf.annotations.types.

Name | Meaning ----|---- BYTE | 1-byte integer SHORT | 2-byte integer INT | 4-byte integer INT3 | 3-byte integer INT5 | 5-byte integer INT6 | 6-byte integer INT7 | 7-byte integer BCD | Binary-Coded Decimal RAW | sequences of bytes that do not fall in the categories above and used as-is CHAR | sequences of bytes which are interpreted as human readable text LONG | 8-bye integer BIT | Numbers that stored as groupings of less than 8 bits FIXED| Real data type that has a fixed number of digits after the radix point, aka. the Q format.

And following conversion between data types above and Java types are defined:

|| byte/Byte| short/Short| int/Integer| long/Long| Enums| String| char/Character| boolean/Boolean| byte[]| int[]| java.util.Date| BigInteger| Double/double| BigDecimal| --| --| --| --| --| --| --| --| --| --| --| --| --| --| --| BYTE| ⚪| ⚪| ⚪| ⚪| ⚪| | | ⚪| | | | | | | SHORT| | ⚪| ⚪| ⚪| ⚪| | | | | | | | | | INT| | | ⚪| ⚪| ⚪| | | | | | ⚪| | | | INT3| | | ⚪| ⚪| | | | | | | | | | | INT5| | | | ⚪| | | | | | | | | | | INT6| | | | ⚪| | | | | | | | | | | INT7| | | | ⚪| | | | | | | | | | | BCD| ⚪| ⚪| ⚪| ⚪| | ⚪| | | | | ⚪| | | | RAW| | | | | | | | | ⚪| ⚪| | | | | CHAR| ⚪| ⚪| ⚪| ⚪| ⚪| ⚪| ⚪| | | | ⚪| ⚪| | | LONG| | | | ⚪| ⚪| | | | | | ⚪| ⚪| | | BIT| ⚪| | | | ⚪| | | ⚪| | | | | | | FIXED| | | | | | | | | | | | | ⚪| ⚪|

In addition, an instance field can also be another Data Packet. It will be automatically handled by calling its serialize or deserialize method and does not need to be annotated with data type annotations above.

Most of the conversions are intuitive but conversion from numeric types to enums deserves a separate chapter in later part of this document.

Modifiers

Modifiers are another set of annotations which describes properties of data types above, they can be found under the package io.github.zhtmf.annotations.modifiers.

Name | Meaning ----|---- Order | Order of this field in the original encoding scheme. This is a must because reflection API in Java does not guarantee the order of Field objects returned by Class.getDeclaredFields. Signed/Unsigned | Specifies that a single field or all fields in a class should be interpreted as signed or unsigned (default). BigEndian/LittleEndian| Specifies that a single field or all fields in a class should be interpreted as big-endian (default) or little-endian. CHARSET| Charset for all CHAR type fields in a class or for a single field. DatePattern| Date pattern string for a java.util.Date field. ListLength | length of java.util.List when handling with a list of basic data types. Length | length of CHAR arrays or RAW arrays. EndsWith | Indicates that a CHAR field is of indeterministic length and its end is marked by a specific sequence of bytes. ListEndsWith | Indicates that length of a list is neither static nor calculated but depends on external conditions at runtime. It enables using a ModifierHandler to encapsulate additional logic. Users can refer to external resources or even modify the list itself within this handler. See how this library handles length of the ConstantPool field in a Java class file in the related example as an example for how to use this modifier.

These annotations can be found under the package io.github.zhtmf.annotations.modifiers. They can be specified both at type level and at field level. And with not surprise, field level annotation overrides same annotation at type level.

Handler

Handler is a mechanism to express dynamic logic in a binary encoding, namely fields whose content depends on value another field or some other conditions and can only be determined at runtime.

There are three types of handlers as listed below: |Type |Assosiated Annotations |Usage
|----|------|----- |Length handler |Length, ListLength|dynamic length |Conditional handler|Conditional |conditionally omitting some fields |Entity handler |Variant |custom instantiation logic of "sub" DataPackets.

Their usage are best explained through a small example, consider the following encoding scheme:

|Field Name |Length|Type | Value
|----------------|------|-----|------------------------------ |Header |1|1-byte unsigned integer|0xF0 |Packet Length |4|4-byte unsigned integer| |Identifier |4|4-byte array|0x01000000 |Sequence Number |2|2-byte unsigned integer| |CRC Flag |1|1-byte integer|0x00: with crc <br/>0x01: without crc |Body |||multiple types, depends on body header |CRC Value |4|4-byte array|If CrcFlag is 0, this part is ommited, otherwise crc32 encoding of the Body part. |Ending |1|1-byte integer|0xFF

body of type 1:

|Field Name |Length|Type | Value
|----------------|------|-----|------------------------------ |Body Header |2|2-byte unsigned integer|0x5109 |Device ID |4|4-byte unsigned integer| |IP|4|4-byte array|

body of type 2:

|Field Name |Length|Type | Value
|----------------|------|-----|------------------------------ |Body Header |2|2-byte unsigned integer|0x55DE |Files Count |1|1-byte unsigned integer| |File Name List|100*N|100-character string per file name|

There are three parts in this binary encoding scheme that cannot be statically defined:

  • The CRC part can be omitted, if the crcFlag is 0.
  • The body part can be either of the two types and the body part itself has a "header" which states what kinds of data follows.
  • In body type 2, its length is indeterministic, which relies on how many file names are in the list.

These problems makes it dynamic and we use handlers to express the logics.

import java.io.IOException;
import java.io.InputStream;

import io.github.zhtmf.DataPacket;
import io.github.zhtmf.annotations.modifiers.Length;
import io.github.zhtmf.annotations.modifiers.LittleEndian;
import io.github.zhtmf.annotations.modifiers.Order;
import io.github.zhtmf.annotations.modifiers.Unsigned;
import io.github.zhtmf.annotations.modifiers.Variant;
import io.github.zhtmf.annotations.types.BYTE;
import io.github.zhtmf.annotations.types.INT;
import io.github.zhtmf.annotations.types.RAW;
import io.github.zhtmf.converters.auxiliary.EntityHandler;
import io.github.zhtmf.
View on GitHub
GitHub Stars7
CategoryDevelopment
Updated4y ago
Forks3

Languages

Java

Security Score

70/100

Audited on Feb 20, 2022

No findings