SkillAgentSearch skills...

Scat

Dynamic analysis of binary programs to retrieve function-related information (arity, type of parameters, coupling).

Install / Use

/learn @Frky/Scat
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

scat

(01/07/17) WARNING: this documentation is out of date (but will be updated soon). In particular, the results presented here are not the latest one obtained with scat and may be different from the recent papers.

« Tell me! Everyone is picking up on that feline beat / 'Cause everything else is obsolete.

- Strictly high-buttoned shoes. »

What is scat?

How does it work?

Some results obtained with scat

How to use it?

List of commands

Current limitations

Publications

In a few words, scat is a python/C++ tool for reverse-engineering on binaries in a single execution. It embeds several analysis, such as arity detection, undertype recovery, etc. The philosophy is to perform each analysis in one lightweight intrumented execution of a given binary (plus possible offline analysis). Because scat works on a single execution, it can only recover parts of the binary that are actually executed: scat is not a tool for exhaustive analysis, it is a tool to analyze what we see.

For now, scat targets x86_64 binaries on Linux using the System V ABI. However, scat is an implementation of a more general approach that can be extended to other architectures and/or ABI. This approach is described in a PhD thesis that will be available online soon.

What is scat?

Originally, scat was a tool to recover high-level information about functions embedded in an executable using dynamic analysis. In particular, scat aimed to recover:

Now, we've made scat more generic, and it handles several reverse-engineering functionalities:

It is also easy to add your own pintools to scat to perform your own analysis.

The following sections provide information about each feature of scat.

Arity

scat embeds an analyzer to retrieve arity of functions.
By arity, we mean two things: first the number of parameters that a function takes, and second if a function returns a value or not.

Type

scat infers a simplified notion of type. We indeed consider only three possible ones: INT, ADDR and FLOAT. We consider that they represent three different classes of variables that make sense semantically. For instance, the size of an integer (char, short, int, long int) and weither it is signed or not does not make a significant difference semantically.

Coupling

We also introduce a notion of coupling between functions. Intuitively, two functions are coupled if they interact during the execution. scat is also able tor recover couples of functions from one execution.

What is coupling?

Let's take an example, with four functions (among others) embedded in a binary: alloc, realloc, free and strlen. From an execution, we follow the values returned by those four functions, and see if they are given as a parameter to another function.

alt data flow Figure 1 - Data flow between functions

In Figure 1, an arrow from A to B means that one value returned by A was given as a parameter to B. From this observation, what we want is to invert the information. It means that instead of knowing where the output goes, we want to know from where parameters come from. So for each function, we compute the proportion of times a parameter is a value returned by a particular function.

alt data flow Figure 2 - Coupling

In Figure 2, we see that 70% of the parameters of free come directly from an output of alloc. Therefore, in this example (meaning regarding this execution), alloc and free are coupled with a coupling coefficient of 0.7. In the same way, realloc and alloc are coupled with a coefficient of 1.

What for?

Finding functions coupled with a high coefficient can have different interests. For example, two functions that are always coupled with a high coefficient are malloc and free, or more generally the allocating and the liberating functions of an allocator. Therefore, the notion of coupling can be the first step to retrieve custom allocators embedded in binaries (for security purposes, such as use-after-free detection).

Example with scat

scat > launch couple ./pgm/bin/mupdf-x11 ./testfile.pdf
[*] Launching couple inference on ./pgm/bin/mupdf-x11
[*] /usr/bin/pin/pin -t ./bin/obj-intel64/couple.so -o ./log/mupdf-x11_couple_1452528857.log -i ./log/mupdf-x11_type_1452528848.log -- ./pgm/bin/mupdf-x11 ./testfile.pdf
[*] Inference results logged in ./log/mupdf-x11_couple_1452528857.log

scat > display mupdf-x11 couple
...
(jsV_toobject, newproperty[2]) -- 0.986
(jsV_toobject, insert[2]) -- 0.976
(jsV_toobject, jsV_newobject[3]) -- 0.924
(jsV_toobject, jsV_setproperty[2]) -- 0.986
(jsV_toobject, js_pushobject[2]) -- 0.926
(jsV_toobject, jsR_defproperty[2]) -- 0.986
(jsU_bsearch, fz_new_pixmap_with_data[2]) -- 0.998
(jsU_bsearch, fz_new_pixmap[2]) -- 0.998
(jsU_bsearch, jsP_newnode[4]) -- 0.82

Information about inference
| Last inference:           1970-01-01 01:00:11
| Total number of couples:  899
| Unique left/right-side:   83/244

Allocator retrieving

Memory management in a binary can be handled by a standard allocator (e.g. the libc allocator) or by a custom one. For many security and safety analysis focused on memory, the knowledge of the allocator is a requirement. scat implements an allocator detection that aim retrieve the two main functions (ALLOC, FREE) of the interface of the main allocator used by a program.

(WIP) We are also currently working on retrieving a third function that is often defined by an allocator: REALLOC.

How does scat work?

General Idea

scat uses pin to instrument dynamically an execution of the program. This instrumentation aims to be scalable. Some analysis are performed on-the-fly, whereas others are a two-step process: an online step during which data is collected, and an offline step to conclude.

Heuristic-based

scat performs heuristic-based analysis, meaning that from a theoretical point of view, we cannot ensure the soundness nor the completeness of the results. However, experiments (see relevant section) show that these heuristics are well-suited for our purposes.

For details about the heuristics, please refer to our papers.

One execution (per recovery)

The goal of scat is not to recover information about every function embedded on the binary, but to demonstrate the relevance of our heuristics in a lightweight way. For this reason, scat only requires on execution for each of the three steps (arity, type and couple).

Pros. The inference is very lightweight.

Cons. Only functions that are executed at least one can be infered.

Some results obtained with scat

Here are presented some results obtained with scat on several open-source libraries. First, note that each result is a consequence of one single execution with standard inputs. Second, the accuracy was obtained by comparison between results given by scat and the source code of the binary under inference. This comparison is also performed automatically by scat if the source code can be provided (see commands parsedata and accuracy in the relevant section.

Arity inference

  • #function is the number of functions detected during the one execution we ran.
  • accuracy is the percentage of functions (that we detect) for which the arity we infered is consistant with the source code.

| | midori | grep | mupdf | emacs | |--------------|----------|--------|---------|---------| | #function | 4094 | 51 | 526 | 591 | | accuracy (%) | 95.8 | 95.6 | 98.7 | 92.4 |

Type inference

  • #function is the number of functions detected during the one execution we ran.
  • accuracy is the percentage of functions (that we detect) for which the type of each parameter we infered is consistant with the source code.

| | midori | grep | mupdf | emacs | |--------------|----------|--------|---------|---------| | #function | 4094 | 51 | 526 | 591 | | accuracy (%) | 96.2 | 100 | 92.5 | 90.4 |

Overhead

  • #function is the number of functions detected during the one execution we ran.
  • T0 is the time of execution with no instrumentation.
  • T1 is the time of execution with arity inference.
  • T2 is the time of execution with type inference.

| | grep | tar | a2ps | | ------------ | ---- | --- | ---- | | size (KB) | 188 | 346 | 360 | | #function | 46 | 101 | 127 | | T0 (s) | 0.80 | 0.99 | 0.80 | | T1 (s) | 1.70 | 2.64 | 31.6 | | T2 (s) | 1.06 | 1.79 | 13.2 |

How to use it?

Requirements

  • scat requires python 2.7 and is not compatible with python 3.
  • You need to have Pin installed on your computer (official website).
  • If you want to test the results of inference (see relative section), you also need to have libclang1-3.4 installed.

Installation

$LOCAL_DIR represents the path to where you want to download scat.

  • Clone this repository: git clone https://github.com/Frky/scat.git $LOCAL_DIR
  • (optional) Create a virtualenv for scat: virtualenv ~/.venv/scat && source ~/.venv/scat/bin/activate
  • Install required python libraries: `pip install -r requirements.tx
View on GitHub
GitHub Stars69
CategoryDevelopment
Updated1mo ago
Forks9

Languages

Python

Security Score

95/100

Audited on Feb 14, 2026

No findings