SkillAgentSearch skills...

Rawutil

A pure-python module to read and write binary packed data

Install / Use

/learn @Tyulis/Rawutil
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

Rawutil

A pure-python and lightweight module to read and write binary data

Introduction

Rawutil is a module aimed at reading and writing binary data in python in the same way as the built-in struct module, but with more features. rawutil's interface is thus compatible with struct, with a few small exceptions, and many things added. It does not have any non-builtin dependency.

What’s already in struct

  • Unpack and pack fixed structures from/to bytes (pack, pack_into, unpack, unpack_from, iter_unpack, calcsize)
  • Struct objects that allow to parse one and for all a structure that may be used several times

What’s different compared to struct

  • Some rarely-used format characters are not in rawutil (N, P and p are not available, n is used for a different purpose)
  • There is no consideration for native size and alignment, thus the @ characters simply applies system byte order with standard sizes and no alignment, just like =
  • There are several differences in error handling that are described below

What has been added to struct

  • Reading and writing files and file-like objects
  • New format characters, to handle padding, alignment, strings, ...
  • Internal references in structures
  • Loops in structures
  • New features to handle variable byte order

Usage

Rawutil exports more or less the same interface as struct. In all those functions, structure may be a simple format string or a Struct object.

unpack

unpack(structure, data, names=None, refdata=(), byteorder=None)

Unpacks the given data according to the structure, and returns the unpacked values as a list.

  • structure is the structure of the data to unpack, as a format string or a Struct object
  • data may be a bytes-like or a file-like object. If it is a file-like object, the data will be unpacked starting from the current position in the file, and will leave the cursor at the end of the data that has been read (effectively reading the data to unpack from the file).
  • names may be a list of field names for a namedtuple, or a callable that takes all unpacked elements in order as arguments, like a namedtuple or a dataclass.
  • refdata may be used to easily input external data into the structure, as #n references. This will be described in the References part below
  • byteorder ("little" / "big") may be used to force the byteorder over the one defined in the format string

Unlike struct, this function does not raises any error if the data is larger than the structure expected size.

Examples :

>>> unpack("4B 3s 3s", b"\x01\x02\x03\x04foobar")
(1, 2, 3, 4, b"foo", b"bar")
>>> unpack("<4s #0I", b"ABCD\x10\x00\x00\x00\x20\x00\x00\x00", names=("string", "num1", "num2"), refdata=(2, ))
RawutilNameSpace(string=b'ABCD', num1=16, num2=32)

unpack_from

unpack_from(structure, data, offset=None, names=None, refdata=(), getptr=False)

Unpacks the given data according to the structure starting from the given position, and returns the unpacked values as a list

This function works exactly like unpack, with two more optional arguments :

  • offset can be used to specify a starting position to read. In a file-like object, the cursor is moved to the given absolute offset, then the data to unpack is read and the cursor is left at the end of the data that has been read. If this parameter is not set, it works like unpack and reads from the current position
  • getptr can be set to True to return the final position in the data, after the unpacked data. The function will then return (values, end_position). If left to False, it works like unpack and only returns the values.

Examples :

>>> unpack_from("<4s #0I", b"ABCD\x10\x00\x00\x00\x20\x00\x00\x00", names=("string", "num1", "num2"), refdata=(2, ))
RawutilNameSpace(string=b'ABCD', num1=16, num2=32)
>>> values, endpos = unpack_from("<2I", b"ABCD\x10\x00\x00\x00\x20\x00\x00\x00EFGH", offset=4, getptr=True)
>>> values
[16, 32]
>>> endpos
12

iter_unpack

iter_unpack(structure, data, names=None, refdata=())

Returns an iterator that will unpack according to the structure and return the values as a list at each iteration. The data must be of a multiple of the structure’s length. If names is defined, each iteration will return a namedtuple, most like unpack and unpack_from. refdata also works the same.

This function is present mostly to ensure compatibility with struct. It is rather recommended to use iterators in structures, that are faster and offer much more control.

Examples :

>>> for a, b, c in iter_unpack("3c", b"abcdefghijkl"):
...     print(a.decode("ascii"), b.decode("ascii"), c.decode("ascii"))
...
a b c
d e f
g h i
j k l

pack

pack(self, *data, refdata=(), byteorder=None, padding_byte=0x00)

Packs the given data in the binary format defined by structure, and returns the packed data as a bytes object.

  • refdata is still there to insert external data in the structure using the #n references, and is a named argument only.
  • byteorder ("little" / "big") may be used to force the byteorder over the one defined in the format string
  • padding_byte is the value of the padding bytes inserted by "x" and "a" format characters

Examples :

>>> pack("<2In", 10, 100, b"String")
b'\n\x00\x00\x00\n\x00\x00\x00String\x00'
>>> pack(">#0B #1I", 10, 100, 1000, 10000, 100000, refdata=(2, 3))
b"\nd\x00\x00\x03\xe8\x00\x00'\x10\x00\x01\x86\xa0"
>>> unpack(">2B3I", _)
[10, 100, 1000, 10000, 100000]

pack_into

pack_into(structure, buffer, offset, *data, refdata=(), byteorder=None)

Packs the given data into the given buffer at the given offset according to the given structure. Refdata still has the same usage as everywhere else.

  • buffer must be a mutable bytes-like object (typically a bytearray). The data will be written directly into it at the given position
  • offset specifies the position to write the data to. It is a required argument.
  • byteorder ("little" / "big") may be used to force the byteorder over the one defined in the format string
  • padding_byte is the value of the padding bytes inserted by "x" and "a" format characters

Examples :

>>> b = bytearray(b"AB----GH")
>>> pack_into("4s", b, 2, b"CDEF")
>>> b
bytearray(b'ABCDEFGH')

pack_file

pack_file(structure, file, *data, position=None, refdata=(), byteorder=None)

Packs the given data into the given file according to the given structure. refdata is still there for the external references data.

  • file can be any binary writable file-like object.
  • position can be set to pack the data at a specific position in the file. If it is left to None, the data will be packed at the current position in the file. In either case, the cursor will end up at the end of the packed data.
  • byteorder ("little" / "big") may be used to force the byteorder over the one defined in the format string
  • padding_byte is the value of the padding bytes inserted by "x" and "a" format characters

Examples :

>>> file = io.BytesIO(b"\x00\x00\x00\x00\x00\x00\x00\x00")
>>> rawutil.pack_file("2B", file, 60, 61)  # Writes at the current position (0)
>>> rawutil.pack_file("c", file, b"A")     # Writes at the current position (now 2)
>>> rawutil.pack_file("2c", file, b"y", b"z", position=6)  # Writes at the given position (6)
>>> file.seek(0)
>>> file.read()
b'<=A\x00\x00\x00yz'

calcsize

calcsize(structure, refdata=())

Returns the size of the data represented by the given structure.

Rawutil structures are not always of a fixed length, as they use internal references and variable length formats. Hence calcsize only works on fixed-length structures, that only use :

  • Fixed-length format characters (basic types with set repeat count)
  • External references (#0 type references, if you provide their value in refdata)
  • Iterators with fixed number of repeats (2(…) or 5[…] will work)
  • Alignments (structures with a and |). As long as everything else is fixed, alignments are too.

Trying to compute the size of a structure that includes any of the following will raise a FormatError (basically, anything that depends on the data to read / write) :

  • Variable-length format characters (namely n and $)
  • {…} iterators, as they depend on the amount of data remaining.
  • Internal references (any /1 or /p1 types references)

Struct

Struct(format, names=None, safe_references=True)

Struct objects allow to pre-parse format strings once and for all. Using only format strings will force to parse them every time you use them. If a structure is used more than once, it will thus save time to wrap it in a Struct object. You can also set the element names once, they will then be used by default every time you unpack data with that structure. Any function that accepts a format string also accepts Struct objects.

A Struct object is initialized with a format string, and can take a names parameter that may be a namedtuple or a list of names, that allows to return data unpacked with this structure in a more convenient namedtuple. It works exactly the same as the names parameter of unpack and its variants, but without having to specify it each time. The namedtuple type can also be retrieved from the structure.names attributes, and can be used to clarify packed values :

>>> my_structure = rawutil.Struct("4s I", names=("magic", "size"))
>>> data = my_structure.names(magic=b"1234", size=64)
>>> my_structure.pack(*data)
b'1234@\x00\x00\x00'

The safe_references parameter, when set to

Related Skills

View on GitHub
GitHub Stars17
CategoryDevelopment
Updated2mo ago
Forks0

Languages

Python

Security Score

95/100

Audited on Jan 23, 2026

No findings