SkillAgentSearch skills...

Likwid

Performance monitoring and benchmarking suite

Install / Use

/learn @RRZE-HPC/Likwid

README


Introduction

Likwid is a simple to install and use toolsuite of command line applications and a library for performance oriented programmers. It works for Intel, AMD, ARMv8 and POWER9 processors on the Linux operating system. There is additional support for Nvidia and AMD GPUs. There is support for ARMv7 and POWER8/9 but there is currently no test machine in our hands to test them properly.

LIKWID Playlist (YouTube)

Build Status General LIKWID DOI

It consists of:

  • likwid-topology: print thread, cache and NUMA topology
  • likwid-perfctr: configure and read out hardware performance counters on Intel, AMD, ARM and POWER processors and Nvidia GPUs
  • likwid-powermeter: read out RAPL Energy information and get info about Turbo mode steps
  • likwid-pin: pin your threaded application (pthread, Intel and gcc OpenMP to dedicated processors)
  • likwid-bench: Micro benchmarking platform for CPU architectures
  • likwid-features: Print and manipulate cpu features like hardware prefetchers (x86 only)
  • likwid-genTopoCfg: Dumps topology information to a file
  • likwid-mpirun: Wrapper to start MPI and Hybrid MPI/OpenMP applications (Supports Intel MPI, OpenMPI, MPICH and SLURM)
  • likwid-perfscope: Frontend to the timeline mode of likwid-perfctr, plots live graphs of performance metrics using gnuplot
  • likwid-memsweeper: Sweep memory of NUMA domains and evict cachelines from the last level cache
  • likwid-setFrequencies: Tool to control the CPU and Uncore frequencies (x86 only)
  • likwid-sysFeatures: Tool to system settings like frequencies, powercaps and prefetchers (experimental)

For further information please take a look at the Wiki or contact us via Matrix chat LIKWID General.


Supported architectures

Intel

  • Intel Atom
  • Intel Pentium M
  • Intel Core2
  • Intel Nehalem
  • Intel NehalemEX
  • Intel Westmere
  • Intel WestmereEX
  • Intel Xeon Phi (KNC)
  • Intel Silvermont & Airmont
  • Intel Goldmont
  • Intel SandyBridge
  • Intel SandyBridge EP/EN
  • Intel IvyBridge
  • Intel IvyBridge EP/EN/EX
  • Intel Xeon Phi (KNL, KNM)
  • Intel Haswell
  • Intel Haswell EP/EN/EX
  • Intel Broadwell
  • Intel Broadwell D
  • Intel Broadwell EP
  • Intel Skylake
  • Intel Kabylake
  • Intel Coffeelake
  • Intel Skylake SP
  • Intel Cascadelake SP
  • Intel Icelake
  • Intel Icelake SP
  • Intel Tigerlake (experimental)
  • Intel SapphireRapids
  • Intel EmeraldRapids

AMD

  • AMD K8
  • AMD K10
  • AMD Interlagos
  • AMD Kabini
  • AMD Zen
  • AMD Zen2
  • AMD Zen3
  • AMD Zen4

ARM

  • ARMv7
  • ARMv8
  • Special support for Marvell Thunder X2
  • Fujitsu A64FX
  • ARM Neoverse N1 (AWS Graviton 2)
  • ARM Neoverse V1
  • HiSilicon TSV110
  • Apple M1 (only with Linux)

POWER (experimental)

  • IBM POWER8
  • IBM POWER9

Nvidia GPUs

AMD GPUs


Download, Build and Install

You can get the releases of LIKWID at: http://ftp.fau.de/pub/likwid/

For build and installation hints see INSTALL file or check the build instructions page in the wiki https://github.com/RRZE-HPC/likwid/wiki/Build

For quick install:

VERSION=stable
wget http://ftp.fau.de/pub/likwid/likwid-$VERSION.tar.gz
tar -xaf likwid-$VERSION.tar.gz
cd likwid-*
vi config.mk # configure build, e.g. change installation prefix and architecture flags
make
sudo make install # sudo required to install the access daemon with proper permissions

For ARM builds, the COMPILER flag in config.mk needs to changed to GCCARMv8 or ARMCLANG (experimental). For POWER builds, the COMPILER flag in config.mk needs to changed to GCCPOWER or XLC (experimental). For Nvidia GPU support, set NVIDIA_INTERFACE in config.mk to true and adjust build-time variables if needed For AMD GPU support, set ROCM_INTERFACE in config.mk to true and adjust build-time variables if needed


Usage examples

<details> <summary><code>likwid-topology</code></summary> <pre> -------------------------------------------------------------------------------- CPU name: Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz CPU type: Intel Skylake processor CPU stepping: 3 ******************************************************************************** Hardware Thread Topology ******************************************************************************** Sockets: 1 Cores per socket: 4 Threads per core: 2 -------------------------------------------------------------------------------- HWThread Thread Core Die Socket Available 0 0 0 0 0 * 1 0 1 0 0 * 2 0 2 0 0 * 3 0 3 0 0 * 4 1 0 0 0 * 5 1 1 0 0 * 6 1 2 0 0 * 7 1 3 0 0 * -------------------------------------------------------------------------------- Socket 0: ( 0 4 1 5 2 6 3 7 ) -------------------------------------------------------------------------------- ******************************************************************************** Cache Topology ******************************************************************************** Level: 1 Size: 32 kB Cache groups: ( 0 4 ) ( 1 5 ) ( 2 6 ) ( 3 7 ) -------------------------------------------------------------------------------- Level: 2 Size: 256 kB Cache groups: ( 0 4 ) ( 1 5 ) ( 2 6 ) ( 3 7 ) -------------------------------------------------------------------------------- Level: 3 Size: 8 MB Cache groups: ( 0 4 1 5 2 6 3 7 ) -------------------------------------------------------------------------------- ******************************************************************************** NUMA Topology ******************************************************************************** NUMA domains: 1 -------------------------------------------------------------------------------- Domain: 0 Processors: ( 0 4 1 5 2 6 3 7 ) Distances: 10 Free memory: 318.203 MB Total memory: 7626.23 MB -------------------------------------------------------------------------------- </pre> </details> <details> <summary><code>likwid-perfctr</code></summary> <pre> $ likwid-perfctr -C 0 -g L2 hostname -------------------------------------------------------------------------------- CPU name: Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz CPU type: Intel Skylake processor CPU clock: 4.01 GHz -------------------------------------------------------------------------------- mytesthost -------------------------------------------------------------------------------- Group 1: L2 +-----------------------+---------+------------+ | Event | Counter | HWThread 0 | +-----------------------+---------+------------+ | INSTR_RETIRED_ANY | FIXC0 | 321342 | | CPU_CLK_UNHALTED_CORE | FIXC1 | 450498 | | CPU_CLK_UNHALTED_REF | FIXC2 | 1118900 | | L1D_REPLACEMENT | PMC0 | 6670 | | L1D_M_EVICT | PMC1 | 1840 | | ICACHE_64B_IFTAG_MISS | PMC2 | 9293 | +-----------------------+---------+------------+

+--------------------------------+------------+ | Metric | HWThread 0 | +--------------------------------+------------+ | Runtime (RDTSC) [s] | 0.0022 | | Runtime unhalted [s] | 0.0001 | | Clock [MHz] | 1613.6392 | | CPI | 1.4019 | | L2D load bandwidth [MBytes/s] | 197.8326 | | L2D load data volume [GBytes] | 0.0004 | | L2D evict bandwidth [MBytes/s] | 54.5745 | | L2D evict data volume [GBytes] | 0.0001 | | L2 bandwidth [MBytes/s] | 528.0381 | | L2 data volume [GBytes] | 0.0011 | +--------------------------------+------------+ </pre>

</details> <details> <summary><code>likwid-pin</code></summary> <pre> $ likwid-pin -c 0,1,2 ./a.out [pthread wrapper] [pthread wrapper] MAIN -> 0 [pthread wrapper] PIN_MASK: 0->1 1->2 [pthread wrapper] SKIP MASK: 0x0 threadid 140566548539136 -> hwthread 1 - OK threadid 140566540146432 -> hwthread 2 - OK Number of Threads requested = 3 Thread 0 running on processor 0 .... Thread 1 running on processor 1 .... Thread 2 running on processor 2 .... [...] </pre> </details> <details> <summary><code>likwid-bench</code></summary> <pre> $ likwid-bench -t triad_avx -W N:2GB:3 Warning: Sanitizing vector length to a multiple of the loop stride 16 and thread count 3 from 62500000 elements (500000000 bytes) to 62499984 elements (499999872 bytes) Allocate: Process running on hwthread 0 (Domain N) - Vector length 62499984/499999872 Offset 0 Alignment 512 Allocate: Process running on hwthread 0 (Domain N) - Vector length 62499984/499999872 Offset 0 Alignme

Related Skills

View on GitHub
GitHub Stars1.9k
CategoryOperations
Updated1d ago
Forks261

Languages

C

Security Score

100/100

Audited on Mar 30, 2026

No findings