NDCrash

NDCrash is a powerful crash reporting library for Android NDK applications. The author was inspired by PLCrashReporter and Google Breakpad. Note that this library is new and has a an experimental status.

Key features

Written in C99 so it may be used in plain C projects.
On-device stack unwinding.
ndk-stack compatible human-readable report format. This tool can be easilly used to access line numbers.
Supports 2 crash handling modes: in-process and out-of-process.
Supports 5 different stack unwinders.
out-of-process mode supports stack traces collection of all application threads.
32 and 64 bit architectures support (depends on unwinder).
Easy-to-use Java wrapper https://github.com/ivanarh/jndcrash
Minimum tested Android version is 4.0.3 but theoretically it may work even on Android 2.3.

Roadmap

These features are not currently implemented but they are in plans. They are likely to be implemented only for out-of-process mode due to in-process mode restrictions.

Stack dump.
Dump of memory around addresses stored in registers.
Memory map dump.

Crash handling modes

NDCrash supports 2 crash handling modes. Mode is a way how and where a crash report data is collected. The same concept is used by Google Breakpad, see its documentation Both of these modes are supported in the library at once but only one of them needs to be activated at run-time (useful for A/B testing). Also any of them can be disabled by compilation flags for optimization purposes.

In-process mode

In-process means crash report is created within crashing process, it happens in a signal handler. In most cases this approach works but there are 2 major problems when you use this mode: signal safety and stack restrictions. It means you should prefer out-of-process mode if it's possible.

Signal safety

In general signal handler requires its code to be signal safe. It's safe to run only a very limited set of functions, see this man page. More details could be read in glibc documentation. A lot of functions habitual for each and every developer are not signal safe, for example heap memory allocation by malloc/free. The worst case that may happen if your handler isn't signal safe is a deadlock during signal handler execution, in such event a crash report won't be created and user will have to destroy your application explicitly. But it doesn't mean that these stuff couldn't be used in signal handler for crash reporting, because an application process will be terminated anyway after signal handler execution and all we need is to create a report. Therefore, a good idea is to minimize unsafe stuff usage in order to make a crash reporter work properly for most part of crashes.

Stack restrictions

Starting from Android 4.4 bionic uses an alternative stack for signal handlers. See bionic source and sigaltstack documentation. This stack has fixed size, by default SIGSTKSZ constant value is used (8 kilobytes on 32-bit ARM Platform). It's very useful feature when a crash due to stack overflow happens. However, this stack size could be insufficient because heap allocations are not safe and you are forced to allocate a memory on a stack. For example, libunwind's unw_cursor_t has a huge size (4 kilobytes) and it's a very big trade-off where to allocate a memory for it. Of course, some static buffer may be used but it's not thread safe, signal handlers may execute concurrently for different threads. Libunwind provides a special "memory pool" mechanism for this case. A workaround for this problem is possible: you can allocate a stack of any size and set it by sigaltstack function. But it should be done for every thread of your application, so some wrapper around pthread is required.

Out-of-process mode

Out-of-process means crash report is generated in a proces other than crashing. This is possible due to ptrace system call that allows some process to inspect a state of another process. Originally out-of-process mode is used by Android system debugger (debuggerd).

When NDCrash works in out-of-process mode it has 2 different parts:

Crash reporting daemon. It's a special service with android:process attribute in its manifest definition, see service tag documentation. It will make it run in a separate process with own PID and address space. This daemon plays a role of debugger, so later "debugger" word will be used with the same meaning.
Signal handler. It's run in the main application process. It's a very simple and lightweight: all it should done is to communicate with debugger and wait until crash report is generated.

Out-of-process mode flow

Details how out-of-process mode works are described below:

When a daemon is started it opens a listening UNIX domain socket which allows crashing process to communicate with. It remains in sleeping state until crash happens or explicit stop is requested.
When a crash happens a signal handler within crashing process is executed. Which in turn connects to a listening UNIX domain socket previously opened by daemon. Then a handler sends some data about a crash (pid, tid, register values) to debugger. This data is necessary to generate a crash report. After that handler sleeps by blocking "recv" operation (waits for a response from daemon).
Daemon receives data from crashing app and attaches to it by ptrace mechanism. At this point daemon has access to a state of crashing process.
Daemon generates a crash report, by default it's saved to a file and written to logcat. Crash report generation includes stack unwinding operation, see information below.
After a crash report is generated daemon sends one byte response to a socket, closes it (disconnects) and starts listening for another connection.
A crashing process receives this byte (recv operation wakes), restores a previous signal handler (that was set by bionic library) and re-raises a signal.

A restoration of previous signal handler is necessary to preserve operating status of standard Android debugger (debuggerd), the bionic library registers this handler in order to initiate crash report generation by debuggerd. This is because we can't obtain registers state for stack unwinding by ptrace (we send it by a socket). To do this we would install a default signal handler (SIG_DFL) and re-raise a signal. This is exactly how google breakpad behaves and broken debuggerd is one of big disadvantages of this crash reporting library.

Stack unwinding

The most interesting information in a report is, of course, a backtrace, also known as stack trace. At the same time, obtaining of this data is the most difficult task during crash report generation. To obtain a backtrace we need to analyze a stack data by walking through all stack frames. This process is called stack unwinding. There are several ways how to perform stack unwinding, also different third party code may be used for this purpose. This led to support of different stack unwinders in NDCrash library. Each unwinder is a module within the library that provides a code that unwinds a stack and writes a backtrace to a crash report. All unwinders are supported by library at one but only one of them should be selected in the moment of library initialization. This may be useful for A/B testing.

Ways to unwind a stack

Stack may be unwound using different algorithms. The main challenge for them in crash reporting is to analyze stack data, determine bounds of every stack frame and extract all return addresses from it. Stack data may be easily accessed by reading memory at an address from "stack pointer" register.

Full stack scanning. This is the most inaccurate unwinding algorithm: taking every stack element (machine word) and searching a function with this address. If it's found, adding it to a backtrace. Used by Bugsnag SDK.
DWARF call frame information data. Located in ELF section .eh_frame or .debug_frame, used on most processor architectures for C++ exceptions handling and by debuggers, such as gdb and lldb. See documentation, chapter 6.4.
ARM Exception Tables. This is a replacement of DWARF call frame information data for 32-bit ARM architecture, locates in ELF section .ARM.extab. The same binary may contain both .ARM.extab and .debug_frame sections but the second only used by debuggers and doesn't used during C++ exceptions handling, it's also stripped on release builds. See documentation about this tables.

On some architectures that not currently supported by NDK (such as MIPS) proper stack unwinding is possible without information in additional sections. These architectures are not supported by NDCrash and it doesn't make sense to mention them.

Details about each unwinder supported in NDCrash are described below.

"cxxabi" unwinder

It uses standard C++ library facilities to unwind a stack (the same functionality is used during C++ exception handling). Specifically, it uses _Unwind_Backtrace and _Unwind_GetIP functions to unwind a stack plus POSIX dladdr function to obtain an info

Ndcrash

Install / Use

README