Heapprof
A 250 LOC heap profiler - easy to hack/port, works out of the box with dynamically linked Linux binaries
Install / Use
/learn @yosefk/HeapprofREADME
heapprof
A 250 LOC heap profiler - easy to hack/port, works out of the box with Linux dynamically linked binaries.
For a description of how it works, see How to make a heap profiler.
Build
./build.sh
Then:
- Copy heapprof.so to a directory in your $LD_LIBRARY_PATH.
- Copy heapprof.py to a directory in your $PATH.
Usage
The following works on Linux; see Porting for how to adapt this to, say, a "bare metal" embedded platform.
To run your program with malloc instrumentation.
env LD_PRELOAD=heapprof.so GLIBCXX_FORCE_NEW=1 your-program args...
The program should produce a core dump in any of the possible ways, such as:
int* p=0; *p=0; // a C program can just crash itself
os.kill(os.getpid(), 11) # a Python program can send itself a SIGSEGV
gdb program -ex "b func_name" -ex r -ex "gcore my.core" -ex q
# the above places a gdb breakpoint and dumps core when it's hit
The core dump contains metadata near each not-yet-freed malloc'd chunk of memory. To decode this metadata, use:
heaaprof.py your-program core > output.heapprof
You will get a list of call stacks sorted by the sum of sizes of allocated blocks, along the lines of:
19% 10000 [1000, 2000, 3000, 1000, 3000]
0x843f234 myfunc /my/file.c
...
This shows the relative and absolute sum of block sizes, a list of the sizes of all live blocks (can be long...), and the call stack.
Runtime options
- When running with heapprof.so, use $HEAPPROF_FRAMES to configure the number of collected stack frames (default: 16)
- When running heapprof.py, set $HEAPPROF_ADDR2LINE if you prefer to use it instead of gdb for function names and source line information (drawback: doesn't see inside shared libraries; advantage: doesn't look at the core dump so works with custom (non-ELF) core dumps that an embedded system could produce)
- $GLIBCXX_FORCE_NEW in the usage example above isn't a heapprof.so thing - it forces GNU libc++ to use malloc so its memory usage isn't obscured by its own memory allocator. If your program has other custom allocators, forcing it to use malloc instead under heapprof.so could be a good idea.
Porting
- heapprof.py will work with a non-standard core dump format (such as a raw memory dump obtained from a JTAG probe or sent over any other channel) if you setenv $HEAPPROF_ADDR2LINE. If addr2line does not work with your executable format you'll need to hack heapprof.py to use whatever works.
- if your platform doesn't have a backtrace() function you'll need to roll your own. See some tips here.
- heapprof.c uses dlsym to get &malloc and &free - its own malloc/free are implemented on top of libc's functions. In a statically linked program, you'd need to pull in the code of malloc and free, rename them and call them by that new name instead.
- heapprof.c uses pthread_mutex_lock to handle recursive calls to malloc in a multithreaded environment. The reason it needs to do so is that backtrace() was observed to call malloc at times. If your backtrace() doesn't use malloc or you're in a single-threaded environment, you can just delete that code. If your backtrace() mallocs, and it's a multi-threaded system but it's not using pthreads, you'll need to reimplement the recursive mutex using your system's facilities.
- heapprof.c uses sbrk as a malloc replacement before its initialization is complete. If your system doesn't have sbrk, you'll need to roll your own memory pool - unless you also don't have dlsym and pthreads, in which case you don't need any of this stuff - it's those two who malloc during init time...
- On systems requiring malloc blocks to be aligned to more than 8 bytes, change ALIGNMENT in heapprof.c accordingly.
- On big endian platforms, find_blocks() from heapprof.py will need tweaking to search for the right magic string - or heapprof.c's magic numbers can be modified.
Related Skills
node-connect
351.2kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
110.6kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
351.2kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
351.2kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
