Retsnoop
Investigate kernel error call stacks
Install / Use
/learn @anakryiko/RetsnoopREADME
What is retsnoop?
Retsnoop is a BPF-based tool for non-intrusive mass-tracing of Linux kernel[^uprobe-support] internals.
Retsnoop's main goal is to provide a flexible and ergonomic way to extract the exact information from the kernel that is useful to the user. At any given moment, a running kernel is doing many different things, across many different subsystems, and on many different cores. Extracting and reviewing all of the various logs, callstacks, tracepoints, etc across the whole kernel can be a very time consuming and burdensome task. Similarly, iteratively adding printk() statements requires long iteration cycles of recompiling, rebooting, and rerunning testcases. Retsnoop, on the other hand, allows users to achieve a much higher signal-to-noise ratio by allowing them to specify both the specific subset of kernel functions that they would like to monitor, as well as the types of information to be collected from those functions, all without requiring any kernel changes. Retsnoop now also can capture function arguments, furthering the promise of giving user relevant and useful information simply and flexibly.
Retsnoop achieves its goal by low-overhead non-intrusive tracing of a collection of kernel functions, intercepting their entries and exits. Retsnoop's central concept is a user-specified set of kernel functions of interest. This allows retsnoop to capture high-relevance data by letting the user flexibly control a relevant subset of kernel functions. All other kernel functions are ignored and don't pollute captured data with irrelevant information.
Retsnoop also supports a set of additional filters for further restricting the context and conditions under which tracing data is captured, allowing to filter based on PID or process name, choose a subset of admissible errors returned from functions, or gate on function latencies.
Retsnoop supports three different and complementary modes.
The default stack trace mode succinctly points to the deepest function call stack that satisfies user conditions (e.g., an error returned from the syscall). It shows a sequence of function calls, the corresponding source code locations at each level of the stack, and emits latencies and returned results:
$ sudo ./retsnoop -e '*sys_bpf' -a ':kernel/bpf/*.c'
Receiving data...
20:19:36.372607 -> 20:19:36.372682 TID/PID 8346/8346 (simfail/simfail):
entry_SYSCALL_64_after_hwframe+0x63 (arch/x86/entry/entry_64.S:120:0)
do_syscall_64+0x35 (arch/x86/entry/common.c:80:7)
. do_syscall_x64 (arch/x86/entry/common.c:50:12)
73us [-ENOMEM] __x64_sys_bpf+0x1a (kernel/bpf/syscall.c:5067:1)
70us [-ENOMEM] __sys_bpf+0x38b (kernel/bpf/syscall.c:4947:9)
. map_create (kernel/bpf/syscall.c:1106:8)
. find_and_alloc_map (kernel/bpf/syscall.c:132:5)
! 50us [-ENOMEM] array_map_alloc
!* 2us [NULL] bpf_map_alloc_percpu
^C
Detaching... DONE in 251 ms.
The function call trace mode (-T) additionally provides a detailed trace
of control flow across the given set of functions, allowing to understand the
kernel behavior more comprehensively:
FUNCTION CALL TRACE RESULT DURATION
----------------------------------------------- -------------------- ---------
→ bpf_prog_load
→ bpf_prog_alloc
↔ bpf_prog_alloc_no_stats [0xffffc9000031e000] 5.539us
← bpf_prog_alloc [0xffffc9000031e000] 10.265us
[...]
→ bpf_prog_kallsyms_add
↔ bpf_ksym_add [void] 2.046us
← bpf_prog_kallsyms_add [void] 6.104us
← bpf_prog_load [5] 374.697us
Last, but not least, LBR mode (Last Branch Records) allows the user to "look back" and peek deeper into individual function's internals, trace "invisible" inlined functions, and pinpoint the problem all the way down to the individual C statements. This mode is especially great when tracing unfamiliar parts of the kernel without having a good idea what to even look for. It enables iterative discovery process without having much of an idea where to look and what functions are relevant:
$ sudo ./retsnoop -e '*sys_bpf' -a 'array_map_alloc_check' --lbr=any
Receiving data...
20:29:17.844718 -> 20:29:17.844749 TID/PID 2385333/2385333 (simfail/simfail):
...
[#22] ftrace_trampoline+0x14c -> array_map_alloc_check+0x5 (kernel/bpf/arraymap.c:53:20)
[#21] array_map_alloc_check+0x13 (kernel/bpf/arraymap.c:54:18) -> array_map_alloc_check+0x75 (kernel/bpf/arraymap.c:54:18)
. bpf_map_attr_numa_node (include/linux/bpf.h:1735:19) . bpf_map_attr_numa_node (include/linux/bpf.h:1735:19)
[#20] array_map_alloc_check+0x7a (kernel/bpf/arraymap.c:54:18) -> array_map_alloc_check+0x18 (kernel/bpf/arraymap.c:57:5)
. bpf_map_attr_numa_node (include/linux/bpf.h:1735:19)
[#19] array_map_alloc_check+0x1d (kernel/bpf/arraymap.c:57:5) -> array_map_alloc_check+0x6f (kernel/bpf/arraymap.c:62:10)
[#18] array_map_alloc_check+0x74 (kernel/bpf/arraymap.c:79:1) -> __kretprobe_trampoline+0x0
...
Please also check out the companion blog post, "Tracing Linux kernel with retsnoop", which goes into more details about each mode and demonstrates retsnoop usage on a few examples inspired by the actual real-world problems.
NOTE: Retsnoop is utilizing the power of BPF technology and so requires root permissions, or a sufficient level of capabilities (
CAP_BPFandCAP_PERFMON). Also note that despite all the safety guarantees of BPF technology and careful implementation, kernel tracing is inherently tricky and potentially disruptive to production workloads, so it's always recommended to test whatever you are trying to do on a non-production system as much as possible. Typical user-context kernel code (which is a majority of Linux kernel code) won't cause any problems, but tracing very low-level internals running in special kernel context (e.g., hard IRQ, NMI, etc) might need some care. Linux kernel itself normally protects itself from tracing such dangerous and sensitive parts, but kernel bugs do slip in sometimes, so please use your best judgement (and test, if possible).
NOTE: Retsnoop relies on BPF CO-RE technology, so please make sure your Linux kernel is built with
CONFIG_DEBUG_INFO_BTF=ykernel config. Without this retsnoop will refuse to start.
[^uprobe-support]: User space tracing might be supported in the future, but for now retsnoop is specializing in tracing kernel internals only.
Using retsnoop
This section provides a reference-style description of various retsnoop
concepts and features for those who want to use the full power of retsnoop to
their advantage. If you are new to retsnoop, consider checking
"Tracing Linux kernel with retsnoop"
blog post to familiarize yourself with the output format and see how the tool
can be utilized in practice.
Specifying traced functions
Retsnoop's operation is centered around tracing multiple functions. Functions
are split into two categories: entry and non-entry (auxiliary)
functions. Entry functions define a set of functions that activate
retsnoop's recording logic. When any entry function is called for the first
time, retsnoop starts tracking and recording all subsequent functions
calls, until the entry function that triggered recording returns. Once
recording is activated, both entry and non-entry functions are recorded in
exactly the same way and are not distinguished between each other. Function
calls within each thread are traced completely independently, so there could
be multiple recordings going on at the same time simultaneously on multi-CPU
systems.
So, in short, entry functions are recording triggers, while non-entry
functions are augmenting recorded data with additional internal function calls,
but only if those are called from an activated entry functions. If some
non-entry function is called before the entry function is called, such a call
is ignored by retsnoop. Such a split avoids low-signal spam such as
recordings of common helper functions that happen outside of an interesting
context of entry functions.
A set of functions is specified through:
-e(--entry) argument(s), to specify a set of entry functions.-a(--allow) argument(s), to specify a set of non-entry functions. If any of the function in the non-entry set overlaps with the entry set, the entry set takes precedence.-d(--deny) argument(s), to exclude specified functions from both entry and non-entry sets. Functions that are denied are filtered out from both entry and non-entry subsets. Denylist always take precedence.
Each of -e, -a, and -d expect a value which could be of two forms:
- function name glob, similar to shell's file globs. E.g.,
*bpf*will match any function that has the "bpf" substring in its name. Only*and?wildcards are supported, and they can be specified multiple times.*wildcard matches any sequence of characters (or none), and?matches exa
Related Skills
node-connect
350.8kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
110.4kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
350.8kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
350.8kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
