Scat
Dynamic analysis of binary programs to retrieve function-related information (arity, type of parameters, coupling).
Install / Use
/learn @Frky/ScatREADME
scat
(01/07/17) WARNING: this documentation is out of date (but will be updated soon). In particular, the results presented here are not the latest one obtained with scat and may be different from the recent papers.
« Tell me! Everyone is picking up on that feline beat / 'Cause everything else is obsolete.
- Strictly high-buttoned shoes. »
Some results obtained with scat
In a few words, scat is a python/C++ tool for reverse-engineering on
binaries in a single execution. It embeds several analysis, such as arity
detection, undertype recovery, etc. The philosophy is to perform each analysis
in one lightweight intrumented execution of a given binary (plus possible
offline analysis). Because scat works on a single execution, it can only
recover parts of the binary that are actually executed: scat is not a tool for
exhaustive analysis, it is a tool to analyze what we see.
For now, scat targets x86_64 binaries on Linux using the System V ABI.
However, scat is an implementation of a more general approach that can be
extended to other architectures and/or ABI. This approach is described in a PhD
thesis that will be available online soon.
What is scat?
Originally, scat was a tool to recover high-level information about functions embedded in an executable
using dynamic analysis. In particular, scat aimed to recover:
Now, we've made scat more generic, and it handles several reverse-engineering
functionalities:
- allocator retrieving
- (WIP) simplified use-after-free detection
It is also easy to add your own pintools to scat to perform your own analysis.
The following sections provide information about each feature of scat.
Arity
scat embeds an analyzer to retrieve arity of functions.
By arity, we mean two things: first the number of parameters that a function
takes, and second if a function returns a value or not.
Type
scat infers a simplified notion of type. We indeed consider only three
possible ones: INT, ADDR and FLOAT.
We consider that they represent three different classes of variables that
make sense semantically.
For instance, the size of an integer (char, short, int, long int)
and weither it is signed or not
does not make a significant difference semantically.
Coupling
We also introduce a notion of coupling between functions. Intuitively, two functions are coupled if they
interact during the execution. scat is also able tor recover couples of functions from one execution.
What is coupling?
Let's take an example, with four functions (among others) embedded in a binary: alloc, realloc, free and strlen.
From an execution, we follow the values returned by those four functions, and see if they are given as a parameter to another
function.
Figure 1 - Data flow between functions
In Figure 1, an arrow from A to B means that one value returned by A was given as a parameter to B.
From this observation, what we want is to invert the information. It means that instead of knowing where the
output goes, we want to know from where parameters come from. So for each function, we compute the proportion
of times a parameter is a value returned by a particular function.
Figure 2 - Coupling
In Figure 2, we see that 70% of the parameters of free come directly from an output of alloc. Therefore, in this example
(meaning regarding this execution), alloc and free are coupled with a coupling coefficient of 0.7. In the same way, realloc and alloc
are coupled with a coefficient of 1.
What for?
Finding functions coupled with a high coefficient can have different interests.
For example, two functions that are always coupled with a high coefficient are malloc and free, or
more generally the allocating and the liberating functions of an allocator. Therefore, the notion
of coupling can be the first step to retrieve custom allocators embedded in binaries (for security purposes,
such as use-after-free detection).
Example with scat
scat > launch couple ./pgm/bin/mupdf-x11 ./testfile.pdf
[*] Launching couple inference on ./pgm/bin/mupdf-x11
[*] /usr/bin/pin/pin -t ./bin/obj-intel64/couple.so -o ./log/mupdf-x11_couple_1452528857.log -i ./log/mupdf-x11_type_1452528848.log -- ./pgm/bin/mupdf-x11 ./testfile.pdf
[*] Inference results logged in ./log/mupdf-x11_couple_1452528857.log
scat > display mupdf-x11 couple
...
(jsV_toobject, newproperty[2]) -- 0.986
(jsV_toobject, insert[2]) -- 0.976
(jsV_toobject, jsV_newobject[3]) -- 0.924
(jsV_toobject, jsV_setproperty[2]) -- 0.986
(jsV_toobject, js_pushobject[2]) -- 0.926
(jsV_toobject, jsR_defproperty[2]) -- 0.986
(jsU_bsearch, fz_new_pixmap_with_data[2]) -- 0.998
(jsU_bsearch, fz_new_pixmap[2]) -- 0.998
(jsU_bsearch, jsP_newnode[4]) -- 0.82
Information about inference
| Last inference: 1970-01-01 01:00:11
| Total number of couples: 899
| Unique left/right-side: 83/244
Allocator retrieving
Memory management in a binary can be handled by a standard allocator (e.g. the libc allocator)
or by a custom one. For many security and safety analysis focused on memory, the knowledge of
the allocator is a requirement. scat implements an allocator detection that aim retrieve the two main functions
(ALLOC, FREE) of the interface of the main allocator used by a program.
(WIP) We are also currently working on retrieving a third function that is often defined by an allocator: REALLOC.
How does scat work?
General Idea
scat uses pin to instrument dynamically an execution of the program.
This instrumentation aims to be scalable. Some analysis are performed on-the-fly, whereas
others are a two-step process: an online step during which data is collected, and an offline step
to conclude.
Heuristic-based
scat performs heuristic-based analysis, meaning that from a theoretical point of view, we cannot ensure
the soundness nor the completeness of the results. However, experiments (see relevant section)
show that these heuristics are well-suited for our purposes.
For details about the heuristics, please refer to our papers.
One execution (per recovery)
The goal of scat is not to recover information about every function embedded on the binary, but
to demonstrate the relevance of our heuristics in a lightweight way. For this reason, scat only
requires on execution for each of the three steps (arity, type and couple).
Pros. The inference is very lightweight.
Cons. Only functions that are executed at least one can be infered.
Some results obtained with scat
Here are presented some results obtained with scat on several open-source libraries.
First, note that each result is a consequence of one single execution with standard inputs.
Second, the accuracy was obtained by comparison between results given by scat and the
source code of the binary under inference. This comparison is also performed automatically by scat if the source
code can be provided (see commands parsedata and accuracy in the relevant section.
Arity inference
- #function is the number of functions detected during the one execution we ran.
- accuracy is the percentage of functions (that we detect) for which the arity we infered is consistant with the source code.
| | midori | grep | mupdf | emacs | |--------------|----------|--------|---------|---------| | #function | 4094 | 51 | 526 | 591 | | accuracy (%) | 95.8 | 95.6 | 98.7 | 92.4 |
Type inference
- #function is the number of functions detected during the one execution we ran.
- accuracy is the percentage of functions (that we detect) for which the type of each parameter we infered is consistant with the source code.
| | midori | grep | mupdf | emacs | |--------------|----------|--------|---------|---------| | #function | 4094 | 51 | 526 | 591 | | accuracy (%) | 96.2 | 100 | 92.5 | 90.4 |
Overhead
- #function is the number of functions detected during the one execution we ran.
- T0 is the time of execution with no instrumentation.
- T1 is the time of execution with arity inference.
- T2 is the time of execution with type inference.
| | grep | tar | a2ps | | ------------ | ---- | --- | ---- | | size (KB) | 188 | 346 | 360 | | #function | 46 | 101 | 127 | | T0 (s) | 0.80 | 0.99 | 0.80 | | T1 (s) | 1.70 | 2.64 | 31.6 | | T2 (s) | 1.06 | 1.79 | 13.2 |
How to use it?
Requirements
scatrequires python 2.7 and is not compatible with python 3.- You need to have
Pininstalled on your computer (official website). - If you want to test the results of inference (see relative section), you also need to have
libclang1-3.4installed.
Installation
$LOCAL_DIR represents the path to where you want to download scat.
- Clone this repository:
git clone https://github.com/Frky/scat.git $LOCAL_DIR - (optional) Create a virtualenv for
scat:virtualenv ~/.venv/scat && source ~/.venv/scat/bin/activate - Install required python libraries: `pip install -r requirements.tx
