SkillAgentSearch skills...

Kolombos

CLI control characters and escape sequences viewer/visualizer

Install / Use

/learn @es7s/Kolombos
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

<div align="center"> <img src="https://user-images.githubusercontent.com/50381946/235438786-588f0045-dd2a-47e0-8da2-3215ec5b6202.png" width="96" height="96"><br> <img src="https://user-images.githubusercontent.com/50381946/219900506-491a3fc0-2e84-4782-ae3d-da4ae9825664.png" width="400" height="64"> </div> <div align="center"> <img src="https://img.shields.io/badge/python-3.7-3776AB?logo=python&logoColor=white&labelColor=333333"> <a href="https://pypi.org/project/kolombos/"><img alt="PyPI" src="https://img.shields.io/pypi/v/kolombos"></a> <a href="https://pepy.tech/project/kolombos/"> <img alt="Downloads" src="https://pepy.tech/badge/kolombos"> </a> </div> <h1> </h1>

CLI application for visualising usually invisible characters and bytes:

  • whitespace characters;
  • ASCII control characters;
  • ANSI escape sequences;
  • UTF-8 encoded characters;
  • binary data.

Installation

Via pipx

pipx install kolombos

Without pipx

System-wide install (sudo)

python -m pip install kolombos

User install (no sudo)

python -m pip install --user kolombos
export PATH="${PATH}:${HOME}/.local/bin/"

Usage

Application can be useful for a variety of tasks, e.g. browsing unknown data formats, searching for patterns or debugging combinations of SGR sequences.

USAGE                                                                                                                                                   
  kolombos [[--text] | --binary] [<options>] [--demo | <file>]     
  
INPUT
  <file>                  file to read from; if empty or "-", read stdin
                          instead; ignored if --demo is present
  -M, --demo              show output examples and exit; see --legend for the
                          description
OPERATING MODE
  -t, --text              open file in text mode [this is a default]
  -b, --binary            open file in binary mode
  -l, --legend            show annotation symbol list and exit
  -v, --version           show app version and exit
  -h, --help              show this help message and exit 

[...]

Text mode and binary mode

kolombos can work in two primary modes: text and binary. The differences between them are line-by-line input reading in text mode vs. fixed size byte chunk reading in binary mode, and extended output in binary mode, which consists of text representation (similar to text mode) and hexademical byte values.

<p align="center"><img src="https://user-images.githubusercontent.com/50381946/211690178-d71a1e97-e9e5-43e9-a77d-500fc2740855.png"></p>

As you can see, some of the settings are shared between both modes, while the others are unique for one or another.

GENERIC OPTIONS
  -f, --buffer <size>     read buffer size, in bytes [default: 4096]
  -L, --max-lines <num>   stop after reading <num> lines [default: no limit]
  -B, --max-bytes <num>   stop after reading <num> bytes [default: no limit]
  -D, --debug             enable debug mode; can be used from 1 to 4 times,
                          each level increases verbosity (-D|DD|DDD|DDDD)
  --color-markers         apply SGR marker format to themselves

TEXT MODE OPTIONS
  -m, --marker <details>  marker details: 0 is none, 1 is brief, 2 is full
                          [default: 0]
  --no-separators         do not print ⢸separators⡇ around escape sequences
  --no-line-numbers       do not print line numbers

BINARY MODE OPTIONS
  -w, --columns <num>     format output as <num>-columns wide table [default: auto]
  -d, --decode            decode valid UTF-8 sequences, print as unicode chars
  --decimal-offsets       output offsets in decimal format [default: hex format]
  --no-offsets            do not print offsets

[...]

Character classes

There are 6 different character classes, and each of those can be displayed normally, highlighted (or focused) or ignored.

| output | character class | byte ranges | focus flag | ignore flag | examples | | :---: | :------------- | :---: | :---: | :---: | :--- | | cc1 | whitespace | 09-0d<br>20 | <code><b>-s</b></code> | <code><b>-S</b></code> | space, line feed, carriage return | | cc2 | control char | 01-08<br>0e-1f | <code><b>-c</b></code> | <code><b>-C</b></code> | null byte, backspace, delete | | cc3 | printable char | 21-7e | <code><b>-p</b></code> | <code><b>-P</b></code> | ASCII alphanumeric and punctuation characters: A-Z, a-z, 0-9, (), [] | | cc4 | escape sequence | 1b[..] | <code><b>-e</b></code> | <code><b>-E</b></code> | ANSI escape sequences controlling cursor position, color, font styling, and other terminal options | | cc5 | UTF-8 sequence | various | <code><b>-u</b></code> | <code><b>-U</b></code> | valid UTF-8 byte sequences that can be decoded into Unicode characters | | cc6 | binary data | 80-ff | <code><b>-i</b></code> | <code><b>-I</b></code> | standalone non-(7 bit)-ASCII bytes |

Examples

Control and whitespace characters

Let's take a look at one of the files from somebody's home directory — .psql_history. At the first sight it's a regular text file:

<p align="center"><img src="https://user-images.githubusercontent.com/50381946/211690258-2cd1c2ce-f254-4988-84e9-3f2584d607b4.png"></p>

But what if we look a bit more deeper into it?

<p align="center"><img src="https://user-images.githubusercontent.com/50381946/211690261-2897e4cd-1b24-4407-a11f-b5398b69088f.png"></p>

kolombos shows us hidden until now characters — not only spaces and line breaks, but even more: some control characters, namely 01 START OF HEADING ASCII bytes, which postgresql uses to store multiline queries.

Red symbol is an example of marker — special sigil that indicates invisibile character in the input. Sigils were selected with a focus on dissimilarity and noticeability, which helps to detect them as soon as possible. Control char and escape sequence markers also provide some details about original input byte(s); there are three different levels of these details in text mode.

  • Level 0 is no details, just the marker itself.
  • Level 1 is medium details (this is a default) — one extra character for control chars and varying amount for escape sequences. For most of the control characters the second char corresponds to their caret notation, e.g. ⱯA should be read as ^A <sup><a href="https://en.wikipedia.org/wiki/C0_and_C1_control_codes#SOH">[wiki]</a></sup>.
  • Level 2 is maximuim amount of verbosity. For control chars it's their 2-digit hexademical value. Also note -c option in the last example below — which tells the application to highlight control characters and make them even more noticable.
<p align="center"><img src="https://user-images.githubusercontent.com/50381946/211690263-d10ecd0e-6390-4ecf-a2e2-e1f99d1893d6.png"></p>

Some of the control characters has unique sigils — for example, null byte (see Legend).

A few more examples of option combinations. First one is --focus-space flag, or -s, which can be useful for a situations where whitespaces are the points of interest, but input is a mess of different character classes.

Second example is a result of running the app with --ignore-space and --ignore-printable options; as you can see, now almost nothing is in the way of observing our precious control characters (if that's what you were after, that is):

<p align="center"><img src="https://user-images.githubusercontent.com/50381946/211690266-45a61611-4b65-45a2-bb5a-fcb95cede039.png"></p>

ANSI escape sequecnces

Escape sequences and their overlapping combinations were the main reason for me to develop this application. For those who doesn't know much about them here's some comprehensive materials: [one] [two].

kolombos can distiguish a few types of escape sequences, but most interesting and frequent type is SGR sequence, which consists of escape control character 1b, square bracket [, one or more digit params separated by ; and m character (terminator). Let me illustrate.

SGR sequences are used for terminal text coloring and formatting. Consider this command with the following output:

<p align="center"><img src="https://user-images.githubusercontent.com/50381946/211690488-39dbb65f-98cb-4473-854f-6422b7005479.png"></p>

kolombos can show us what exactly is happening out there:

<p align="center"><img src="https://user-images.githubusercontent.com/50381946/211690491-e9508abc-d2d3-48e1-8a30-d4a519f42d93.png"></p>

There are 3 different types of markers in the example above:

  • ǝ is a sigil for regular SGR sequence (which for example sets the color of the following text to red);
  • θ is a reset SGR sequence (ESC[0m) which completely disables all previously set colors and effects;
  • Ͻ is CSI sequence (more common sequence class which includes SGRs) — they also begin with ESC[, but have different terminator characters; in general, they control cursor position.
  • Other types are listed in Legend section.

For this example binary more would be more convenient.

<p align="center"><img src="https://user-images.githubusercontent.com/50381946/211690493-8fa5f092-3a02-4f83-85de-93c

Related Skills

View on GitHub
GitHub Stars4
CategoryDevelopment
Updated1y ago
Forks0

Languages

Python

Security Score

75/100

Audited on Apr 21, 2024

No findings