Omnitrace
Omnitrace: Application Profiling, Tracing, and Analysis
Install / Use
/learn @ROCm/OmnitraceREADME
Omnitrace: Application Profiling, Tracing, and Analysis
[!NOTE] Omnitrace is being rebranded to ROCm Systems Profiler and its new home is https://github.com/ROCm/rocprofiler-systems. All future development will occur in the new repository; this includes upgrading the tool to use rocprofiler-sdk. This repository will remain open for some time and can be used with versions of ROCm before the introduction of rocprofiler-sdk (that is, before ROCm version 6.2).
Overview
AMD Research is seeking to improve observability and performance analysis for software running on AMD heterogeneous systems. If you are familiar with rocprof and/or uProf, you will find many of the capabilities of these tools available via Omnitrace in addition to many new capabilities.
Omnitrace is a comprehensive profiling and tracing tool for parallel applications written in C, C++, Fortran, HIP, OpenCL, and Python which execute on the CPU or CPU+GPU. It is capable of gathering the performance information of functions through any combination of binary instrumentation, call-stack sampling, user-defined regions, and Python interpreter hooks. Omnitrace supports interactive visualization of comprehensive traces in the web browser in addition to high-level summary profiles with mean/min/max/stddev statistics. In addition to runtimes, omnitrace supports the collection of system-level metrics such as the CPU frequency, GPU temperature, and GPU utilization, process-level metrics such as the memory usage, page-faults, and context-switches, and thread-level metrics such as memory usage, CPU time, and numerous hardware counters.
[!NOTE] Full documentation is available at Omnitrace documentation in an organized, easy-to-read, searchable format. The documentation source files reside in the
/docsfolder of this repository. For information on contributing to the documentation, see Contribute to ROCm documentation
Data Collection Modes
- Dynamic instrumentation
- Runtime instrumentation
- Instrument executable and shared libraries at runtime
- Binary rewriting
- Generate a new executable and/or library with instrumentation built-in
- Runtime instrumentation
- Statistical sampling
- Periodic software interrupts per-thread
- Process-level sampling
- Background thread records process-, system- and device-level metrics while the application executes
- Causal profiling
- Quantifies the potential impact of optimizations in parallel codes
Data Analysis
- High-level summary profiles with mean/min/max/stddev statistics
- Low overhead, memory efficient
- Ideal for running at scale
- Comprehensive traces
- Every individual event/measurement
- Application speedup predictions resulting from potential optimizations in functions and lines of code (causal profiling)
Parallelism API Support
- HIP
- HSA
- Pthreads
- MPI
- Kokkos-Tools (KokkosP)
- OpenMP-Tools (OMPT)
GPU Metrics
- GPU hardware counters
- HIP API tracing
- HIP kernel tracing
- HSA API tracing
- HSA operation tracing
- System-level sampling (via rocm-smi)
- Memory usage
- Power usage
- Temperature
- Utilization
CPU Metrics
- CPU hardware counters sampling and profiles
- CPU frequency sampling
- Various timing metrics
- Wall time
- CPU time (process and/or thread)
- CPU utilization (process and/or thread)
- User CPU time
- Kernel CPU time
- Various memory metrics
- High-water mark (sampling and profiles)
- Memory page allocation
- Virtual memory usage
- Network statistics
- I/O metrics
- ... many more
Quick Start
Installation
- Visit Releases page
- Select appropriate installer (recommendation:
.shscripts do not require super-user priviledges unlike the DEB/RPM installers)- If targeting a ROCm application, find the installer script with the matching ROCm version
- If you are unsure about your Linux distro, check
/etc/os-releaseor use theomnitrace-install.pyscript
If the above recommendation is not desired, download the omnitrace-install.py and specify --prefix <install-directory> when
executing it. This script will attempt to auto-detect a compatible OS distribution and version.
If ROCm support is desired, specify --rocm X.Y where X is the ROCm major version and Y
is the ROCm minor version, e.g. --rocm 5.4.
wget https://github.com/ROCm/omnitrace/releases/latest/download/omnitrace-install.py
python3 ./omnitrace-install.py --prefix /opt/omnitrace/rocm-5.4 --rocm 5.4
See the Omnitrace installation guide for detailed information.
Setup
NOTE: Replace
/opt/omnitracebelow with installation prefix as necessary.
- Option 1: Source
setup-env.shscript
source /opt/omnitrace/share/omnitrace/setup-env.sh
- Option 2: Load modulefile
module use /opt/omnitrace/share/modulefiles
module load omnitrace
- Option 3: Manual
export PATH=/opt/omnitrace/bin:${PATH}
export LD_LIBRARY_PATH=/opt/omnitrace/lib:${LD_LIBRARY_PATH}
Omnitrace Settings
Generate an omnitrace configuration file using omnitrace-avail -G omnitrace.cfg. Optionally, use omnitrace-avail -G omnitrace.cfg --all for
a verbose configuration file with descriptions, categories, etc. Modify the configuration file as desired, e.g. enable
perfetto, timemory, sampling, and process-level sampling by default
and tweak some sampling default values:
# ...
OMNITRACE_TRACE = true
OMNITRACE_PROFILE = true
OMNITRACE_USE_SAMPLING = true
OMNITRACE_USE_PROCESS_SAMPLING = true
# ...
OMNITRACE_SAMPLING_FREQ = 50
OMNITRACE_SAMPLING_CPUS = all
OMNITRACE_SAMPLING_GPUS = $env:HIP_VISIBLE_DEVICES
Once the configuration file is adjusted to your preferences, either export the path to this file via OMNITRACE_CONFIG_FILE=/path/to/omnitrace.cfg
or place this file in ${HOME}/.omnitrace.cfg to ensure these values are always read as the default. If you wish to change any of these settings,
you can override them via environment variables or by specifying an alternative OMNITRACE_CONFIG_FILE.
Call-Stack Sampling
The omnitrace-sample executable is used to execute call-stack sampling on a target application without binary instrumentation.
Use a double-hypen (--) to separate the command-line arguments for omnitrace-sample from the target application and it's arguments.
omnitrace-sample --help
omnitrace-sample <omnitrace-options> -- <exe> <exe-options>
omnitrace-sample -f 1000 -- ls -la
Binary Instrumentation
The omnitrace executable is used to instrument an existing binary. Call-stack sampling can be enabled alongside
the execution an instrumented binary, to help "fill in the gaps" between the instrumentation via setting the OMNITRACE_USE_SAMPLING
configuration variable to ON.
Similar to omnitrace-sample, use a double-hypen (--) to separate the command-line arguments for omnitrace from the target application and it's arguments.
omnitrace-instrument --help
omnitrace-instrument <omnitrace-options> -- <exe-or-library> <exe-options>
Binary Rewrite
Rewrite the text section of an executable or library with instrumentation:
omnitrace-instrument -o app.inst -- /path/to/app
In binary rewrite mode, if you also want instrumentation in the linked libraries, you must also rewrite those libraries.
Example of rewriting the functions starting with "hip" with instrumentation in the amdhip64 library:
mkdir -p ./lib
omnitrace-instrument -R '^hip' -o ./lib/libamdhip64.so.4 -- /opt/rocm/lib/libamdhip64.so.4
export LD_LIBRARY_PATH=${PWD}/lib:${LD_LIBRARY_PATH}
Verify via
lddthat your executable will load the instrumented library -- if you built your executable with an RPATH to the original library's directory, then prefixingLD_LIBRARY_PATHwill have no effect.
Once you have rewritten your executable and/or libraries with instrumentation, you can just run the (instrumented) executable or exectuable which loads the instrumented libraries normally, e.g.:
omnitrace-run -- ./app.inst
If you want to re-define certain settings to new default in a binary rewrite, use the --env option. This omnitrace option
will set the environment variable to the given value but will not override it. E.g. the default value of OMNITRACE_PERFETTO_BUFFER_SIZE_KB
is 1024000 KB (1 GiB):
# buffer size defaults to 1024000
omnitr
