EffectiveSan
Runtime type and bounds-error checking for C/C++
Install / Use
/learn @GJDuck/EffectiveSanREADME
The Effective Type Sanitizer --- Dynamically Typed C/C++
EffectiveSan is a compiler tool that automatically inserts dynamic (i.e., runtime) type and bounds checking into C/C++ programs. The aim of EffectiveSan is to detect memory errors and type bugs in arbitrary C/C++ code.
Background
C and C++ are examples of statically typed programming languages, meaning that types are checked at compile time and not at runtime. Furthermore, C and C++ are weakly typed programming languages that allow the type system to be bypassed, including:
- Arbitrary Casts, e.g., casting from a (
T *) to an (S *) is possible (both explicitly and implicitly via operations likememcpy); and - No Bounds Checking, e.g., if reading the
i<sup>th</sup> element of a (int[50]) array object, then it is never checked (statically or dynamically) that (i < 50); and - Use-after-free (allowing possible type mutation) is also possible.
Weak static typing is primarily motivated by flexibility and efficiency (dynamic type and bounds checking is expensive). However, this also means that the programmer is responsible for ensuring that types are not violated at runtime. In practice, the programmer does not always get it right, and bugs relating to type violations are common and potentially serious. For example, consider the following "benign" code snippet:
struct S {int a[3]; char *p;};
struct T {float f; struct S s;};
int get(struct T *t, int idx)
{
return t->s.a[idx];
}
This snippet is well-typed according to standard C/C++ static type checking. However, at runtime, a lot can go wrong:
-
Type Confusion Errors: Pointer
tmay be of the wrong type:S *s = (S *)malloc(sizeof(struct S)); get((T *)s, 2); -
Use-after-free Errors: Pointer
tmay have beenfree'ed:free(t); get(t, 2); -
(Sub-)Object Bounds Errors: Index
idxmay be outside the bounds of the (sub-)object (a):get(t, 3);
In practice, type and memory errors can be a lot more subtle, and are a common
source of security vulnerabilities, program bugs, and other undefined
behavior. For example, such errors are commonly exploited for control flow
hijacking attacks, e.g., by overwriting the virtual function table pointer
(vptr) of C++ objects. This can be achieved in several ways using the
runtime errors described above, including:
- Using a object bounds overflow from object
AtoBto directly overwriteB.vptr; - Using a sub-object bounds overflow within the same object
Bto directly overwriteB.vptr; - Using type confusion to cast a pointer
ptoBto a different type, then overwriteB.vptrindirectly using a "valid" operation onp; and - Using a Use-after-free similar to type confusion, where previously
free'ed pointerppoints to a different type.
Assuming an attacker can overwrite the vptr with a suitable value, control
flow can then be hijacked using a call to a virtual function.
Beyond security, it is often useful to detect and eliminate deliberate type-based undefined behavior---so-called type abuse---since it can harm code quality/portability. For example, one idiom we have observed in the wild is to implement C++-style inheritance using structures with overlapping members, e.g.:
struct Base { int x; float y; };
struct Derived { int x; float y; char z; };
We have observed such idioms in SPEC2006's perlbench and povray benchmarks
(despite povray being a C++ program).
Such idioms may violate the compiler's Type Base Aliasing Analysis (TBAA)
assumptions, causing code to be miscompiled, else requiring special compiler
options such as -fno-strict-aliasing. Type abuse may also mask dangerous
(security critical) type errors as well.
Dynamic Typing for C and C++
The Effective Type Sanitizer (EffectiveSan) is a tool for instrumenting C/C++ programs with dynamic type checks---effectively transforming C/C++ into a dynamically typed programming languages. The instrumented dynamic type check compares the runtime type of an object (a.k.a. the effective type using C standard terminology) against the static type declared in the code. An error will be logged if there is a mismatch.
For example, EffectiveSan will instrument the get() function by adding
type and bounds checks:
int get(struct T *t, int idx)
{
BOUNDS b = type_check(t, struct T); // Inserted type check
b = bounds_narrow(b, t->s.a); // Inserted bounds narrow
int *tmp = &t->s.a;
bounds_check(tmp, b); // Inserted bounds check
return tmp[idx];
}
Here, three additional operations are inserted:
type_checkchecks that the dynamic type of pointertmatches the static type (struct T). This means thattmust point to either an object of typestruct T, a sub-object of typestruct Tof some larger object, or a (sub-)object of some other type coercible to typestruct T(e.g., a character arraychar[]). If the type is compatible, the dynamic (sub-)object bounds is returned.bounds_narrownarrows the boundsbto the sub-object of interest. In this case, the sub-object iss.a.bounds_checkverifies that the memory access is within the narrowed bounds.
If either type_check or bounds_check fails then an error will be logged.
By default, all logged errors are printed to stderr when the program exits
(EffectiveSan does not stop execution, although this is configurable).
The inserted instrumentation can detect type and memory errors described above. For example, consider the type error:
S *s = (S *)malloc(sizeof(struct S));
get((T *)s, 2);
Then running this program results in a type error:
TYPE ERROR:
pointer = 0x30a12d3740 (heap)
expected = struct T
actual = struct S { int32_t[3]; /*0..12*/ int8_t *; /*16..24*/ } [+0]
>int32_t [+0]
Here:
pointeris the pointer value, which happens to be allocated from the heap;expectedis the expected type, which in this case is (struct T); andactualis the actual dynamic type of the pointer. The "actual" type is represented as a set of (type [+offset]) pairs, starting from the allocation type of the object (struct S), all the way to the type of the inner-most sub-object at the same offset (int32_t, a.k.a.int). Offset values are in bytes. Any pair with zero offset (i.e.,[+0]) represents a valid type for the pointer. In other words, this is a type error because there is no "actual" type pair corresponding to (struct T [+0]).
Next, consider the use-after-free error:
free(t);
get(t, 2);
EffectiveSan considers "free" objects to have a special "<free memory>"
type. This allows use-after-free errors to be detected as a special kind
of type error:
USE-AFTER-FREE ERROR:
pointer = 0x4034b5bfd0 (heap)
expected = struct T
actual = <free memory> [+0]
Finally, consider the (sub-)object bounds error:
get(t, 4)
EffectiveSan uses dynamic typing and bounds narrowing to detect sub-object bounds errors:
SUBOBJECT BOUNDS ERROR:
pointer = 0x405efddc68 (heap)
type = struct T { float32_t; /*0..4*/ struct S; /*8..32*/ } [+8..+20]
>struct S { int32_t[3]; /*0..12*/ int8_t *; /*16..24*/ } [+0..+12]
>>int32_t [+0..+12]
bounds = 0..12 (8..20)
access = 16..20 (24..28)
Here:
pointeris the pointer value, similar to before;typeis a set of (type [+lb..+ub]) triples representing the dynamic type of the accessed (sub-)object, and the accessed sub-object's bounds. Bounds are pairs of byte offsets;boundsis the bounds of the accessed sub-object in (1) bounds relative to the start of the sub-object, and (2) bounds relative to the start of the allocation; andaccessis the bounds of the memory access, relative to (1) and (2) explained above.
Using the above instrumentation, EffectiveSan can detect multiple classes of errors, including type confusion, object bounds errors, sub-object bounds errors, and use-after-free errors---all using the same underlying methodology.
How EffectiveSan Works
We give a very brief overview on how some of the internals of EffectiveSan work. For more detailed information, please see our paper (see further reading below) or study the source code.
EffectiveSan consists of three main components:
- A "modified"
clangfront-end that preserves high-level C/C++ type information as LLVM IR meta-data. - A LLVM-instrumentation pass that inserts type/bounds checks, as well as replaces memory allocation with the "typed" version.
- A run-time support library that implements the meta data tracking scheme.
The runtime system for tracking dynamic types is the main innovation. The basic idea is to build on top of low fat pointers which are a system for tracking the bounds (size and base) of allocated objects, which was originally developed for bounds checking. Instead, EffectiveSan uses low fat pointers to store type meta data at the base of allocated objects. For example, consider the memory allocation:
q = (struct T *)malloc(sizeof(struct T));
Then, under EffectiveSan, the memory layout will be as follows:
<p align="center"> <img src="images/layout.png" width="50%" alt="EffectiveSan object layout."> </p>Here (META) is the EffectiveSan object meta data comprising (1)
a reference to the type meta data for (struct T), and (2) the
size (array l
