Gprofiler

gProfiler is a system-wide profiler, combining multiple sampling profilers to produce unified visualization of what your CPU is spending time on.

Generate Convert Improve

Install / Use

/learn @intel/Gprofiler

About this skill

Quality Score

0/100

README

gProfiler

gProfiler combines multiple sampling profilers to produce unified visualization of what your CPU is spending time on, displaying stack traces of all processes running on your system across native programs<sup id="a1">1</sup> (includes Golang), Java and Python runtimes, and kernel routines.

gProfiler can upload its results to a self hosted studio using [gprofiler performance studio]((https://github.com/intel/gprofiler-performance-studio), which aggregates the results from different instances over different periods of time and can give you a holistic view of what is happening on your entire cluster. To upload results, you will have to register and generate a token on the website.

gProfiler runs on Linux (on x86_64 and Aarch64; Aarch64 support is not complete yet and not all runtime profilers are supported, see architecture support).

For installation methods, jump to run as...

Granulate Performance Studio example view

Configuration & parameters

This section describes the possible options to control gProfiler's behavior.

Output options

gProfiler can produce output in two ways:

Create an aggregated, collapsed stack samples file (profile_<timestamp>.col) and a flamegraph file (profile_<timestamp>.html). Two symbolic links (last_profile.col and last_flamegraph.html) always point to the last output files.

Use the --output-dir/-o option to specify the output directory.

If --rotating-output is given, only the last results are kept (available via last_profle.col and last_flamegraph.html). This can be used to avoid increasing gProfiler's disk usage over time. Useful in conjunction with --upload-results (explained ahead) - historical results are available in the Granulate Performance Studio, and the very latest results are available locally.

--no-flamegraph can be given to avoid generation of the profile_<timestamp>.html file - only the collapsed stack samples file will be created.

The output is a collapsed file (.col) and its format is described ahead.
Send the results to the Granulate Performance Studio for viewing online with filtering, insights, and more.

Use the --upload-results/-u flag. Pass the --token option to specify the token provided by Granulate Performance Studio, and the --service-name option to specify an identifier for the collected profiles, as will be viewed in a self hosted studio. Profiles sent from numerous gProfilers using the same service name will be aggregated together.

Note: both flags can be used simultaneously, in which case gProfiler will create the local files and upload the results.

Network requirements

When --upload-results is used, gProfiler will communicate with a self hosted studio. Make sure those domains are accessible for HTTPS access. Additionally, if you download gProfiler from the GitHub releases you'll need https://github.com, or if you use the Docker image you'll need the Docker registry accessible (https://index.docker.io by default).

If you require an HTTPS proxy, make sure the proxy has those domains whitelisted.

Profiling options

--profiling-frequency: The sampling frequency of the profiling, in hertz.
--profiling-duration: The duration of the each profiling session, in seconds.

The default profiling frequency is 11 hertz. Using higher frequency will lead to more accurate results, but will create greater overhead on the profiled system & programs.

For each profiling session (each profiling duration), gProfiler produces outputs (writing local files and/or uploading the results to the Granulate Performance Studio).

Java profiling options

--no-java or --java-mode disabled: Disable profilers for Java.
--no-java-async-profiler-buildids: Disable embedding of buildid+offset in async-profiler native frames (used when debug symbols are unavailable).

Python profiling options

--no-python: Alias of --python-mode disabled.
--python-mode: Controls which profiler is used for Python.
- auto - (default) try with PyPerf (eBPF), fall back to py-spy.
- pyperf - Use PyPerf with no py-spy fallback.
- pyspy/py-spy - Use py-spy.
- disabled - Disable profilers for Python.

Profiling using eBPF incurs lower overhead & provides kernel & native stacks.

PHP profiling options

--php-mode phpspy: Enable PHP profiling with phpspy.
--no-php or --php-mode disabled: Disable profilers for PHP.
--php-proc-filter: Process filter (pgrep) to select PHP processes for profiling (this is phpspy's -P option)

.NET profiling options

--dotnet-mode=dotnet-trace: Enable .NET profiling with dotnet-trace
--no-dotnet or --dotnet-mode=disabled: Disable profilers for .NET.

Ruby profiling options

--no-ruby or --ruby-mode disabled: Disable profilers for Ruby.

NodeJS profiling options

--nodejs-mode: Controls which profiler is used for NodeJS.
- none - (default) no profiler is used.
- perf - augment the system profiler (perf) results with jitdump files generated by NodeJS. This requires running your node processes with --perf-prof (and for Node >= 10, with --interpreted-frames-native-stack). See this NodeJS page for more information.
- attach-maps - Generates perf map in runtime, see description ahead.

attach-maps

In this mode, gProfiler will automatically load a library based on node-linux-perf module to all target NodeJS processes. This library enables perf-pid.map files generation in runtime, without requiring the app to be started with the --perf-prof flag, and from that point perf is able to symbolicate the compiled JavaScript functions, and we get JavaScript symbols properly.

gProfiler uses the inspector protocol (documented here) to connect to target processes. gProfiler will send SIGUSR1, connect to the process and request to it load the library matching its NodeJS version (gProfiler comes built-in with arsenal of libraries for common NodeJS versions). After the library is loaded, gProfiler invokes the perf-pid.map generation. This is done to all running NodeJS processes - those running before gProfiler started, and done starting during gProfiler's run. Upon stopping, gProfiler stops the functionality, so processes no longer continue to write those file.
This requires the entrypoint of application to be CommonJS script. (Doesn't work for ES modules)

Golang profiling options

Golang profiling is based on perf, used via the system profiler (explained in System profiling options).

As with all native programs, the Golang program must have symbols - not stripped - otherwise, additional debug info files must be provided. Without symbols info (specifically the .symtab section) perf is unable to symbolicate the stacktraces of the program. In that case gProfiler will not tag the stacks as Golang and you will not see any symbols.

Make sure you are not passing -s to the -ldflags during your build - -s omits the symbols table; see more details here.

System profiling options

--perf-mode: Controls the global perf strategy. Must be one of the following options:
- fp - Use Frame Pointers for the call graph. This is the default.
- dwarf - Use DWARF for the call graph (adds the --call-graph dwarf argument to the perf command)
- smart - Run both fp and dwarf, then choose the result with the highest average of stack frames count, per process.
- disabled - Avoids running perf at all. See perf-less mode.

Rootless mode

gProfiler can be run in rootless mode, profiling without root or sudo access with limited functionality by using the --rootless argument.

Profiling is limited to perf (not java, python, ruby, etc.), and requires passing --pids with a list of processes owned by the current user.

If the default directories for the log file and pid file (e.g., /var/log or /var/run) are not writable by the current user, these must be explicitly directed to a writable path with --log-file {LOG_FILE} and --pid-file {PID_FILE} respectively. If gProfiler was run previously as root or with sudo, it will create the temporary directory gprofiler_tmp in the default location (usually /tmp) or wherever specified. If gProfiler is run again with --rootless, it will fail to run as it will be trying to write to the gprofiler_tmp directory which has already been created by root user. Delete this root owned directory or redirect to a different (user writable) directory and re-run with --rootless.

Some additional configuration may be required to operate without root.

perf_event_paranoid

By default /proc/sys/kernel/perf_event_paranoid may be configured such that perf cannot operate without root. Consider setting to -1 if --rootless indicates permission errors (this is the least secure mode, so refer to [perf-security documentation](https://www.kernel.org/doc/html/latest/admin-guide/perf-security.html for security information)). It may also be necessary to set perf_event_mlock_kb.

-1: Allow use of (almost) all events by all users Ignore mlock limit after perf_event_mlock_kb without CAP_IPC_LOCK
0: Disallow raw and ftrace function tracepoint access
1: Disallow CPU event access
2: Disallow kernel profiling To make the adjusted perf_event_paranoid setting permanent, preserve it in /etc/sysctl.conf (e.g., kernel.perf_event_paranoid = {SETTING}).

perf_event_mlo

Related Skills

node-connect

342.0k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

84.7k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

342.0k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

commit-push-pr

84.7k

Commit, push, and open a PR