SkillAgentSearch skills...

Hwloc

Hardware locality (hwloc)

Install / Use

/learn @open-mpi/Hwloc
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

This is a truncated and poorly-formatted version of the documentation main page. See https://www.open-mpi.org/projects/hwloc/doc/ for more.

hwloc Overview

The Hardware Locality (hwloc) software project aims at easing the process of discovering hardware resources in parallel architectures. It offers command-line tools and a C API for consulting these resources, their locality, attributes, and interconnection. hwloc primarily aims at helping high-performance computing (HPC) applications, but is also applicable to any project seeking to exploit code and/or data locality on modern computing platforms.

hwloc provides command line tools and a C API to obtain the hierarchical map of key computing elements within a node, such as: NUMA memory nodes, shared caches, processor packages, dies and cores, processing units (logical processors or "threads") and even I/O devices. hwloc also gathers various attributes such as cache and memory information, and is portable across a variety of different operating systems and platforms.

hwloc primarily aims at helping high-performance computing (HPC) applications, but is also applicable to any project seeking to exploit code and/or data locality on modern computing platforms.

hwloc supports the following operating systems:

  • Linux (with knowledge of cgroups and cpusets, memory targets/initiators, etc.) on all supported hardware, including Intel Xeon Phi, ScaleMP vSMP, and NumaScale NumaConnect.
  • Solaris (with support for processor sets and logical domains)
  • AIX
  • Darwin / OS X
  • FreeBSD and its variants (such as kFreeBSD/GNU)
  • NetBSD
  • Microsoft Windows

Since it uses standard Operating System information, hwloc's support is mostly independant from the processor type (x86, powerpc, ...) and just relies on the Operating System support. The main exception is BSD operating systems (NetBSD, FreeBSD, etc.) because they do not provide support topology information, hence hwloc uses an x86-only CPUID-based backend (which can be used for other OSes too, see the Components and plugins section).

To check whether hwloc works on a particular machine, just try to build it and run lstopo or lstopo-no-graphics. If some things do not look right (e.g. bogus or missing cache information), see Questions and Bugs.

hwloc only reports the number of processors on unsupported operating systems; no topology information is available.

For development and debugging purposes, hwloc also offers the ability to work on "fake" topologies:

  • Symmetrical tree of resources generated from a list of level arities, see Synthetic topologies.
  • Remote machine simulation through the gathering of topology as XML files, see Importing and exporting topologies from/to XML files.

hwloc can display the topology in a human-readable format, either in graphical mode (X11), or by exporting in one of several different formats, including: plain text, LaTeX tikzpicture, PDF, PNG, and FIG (see Command-line Examples below). Note that some of the export formats require additional support libraries.

hwloc offers a programming interface for manipulating topologies and objects. It also brings a powerful CPU bitmap API that is used to describe topology objects location on physical/logical processors. See the Programming Interface below. It may also be used to binding applications onto certain cores or memory nodes. Several utility programs are also provided to ease command-line manipulation of topology objects, binding of processes, and so on.

Bindings for several other languages are available from the project website.

Command-line Examples

On a 4-package 2-core machine with hyper-threading, the lstopo tool may show the following graphical output:

[dudley]

Here's the equivalent output in textual form:

Machine NUMANode L#0 (P#0) Package L#0 + L3 L#0 (4096KB) L2 L#0 (1024KB) + L1 L#0 (16KB) + Core L#0 PU L#0 (P#0) PU L#1 (P#8) L2 L#1 (1024KB) + L1 L#1 (16KB) + Core L#1 PU L#2 (P#4) PU L#3 (P#12) Package L#1 + L3 L#1 (4096KB) L2 L#2 (1024KB) + L1 L#2 (16KB) + Core L#2 PU L#4 (P#1) PU L#5 (P#9) L2 L#3 (1024KB) + L1 L#3 (16KB) + Core L#3 PU L#6 (P#5) PU L#7 (P#13) Package L#2 + L3 L#2 (4096KB) L2 L#4 (1024KB) + L1 L#4 (16KB) + Core L#4 PU L#8 (P#2) PU L#9 (P#10) L2 L#5 (1024KB) + L1 L#5 (16KB) + Core L#5 PU L#10 (P#6) PU L#11 (P#14) Package L#3 + L3 L#3 (4096KB) L2 L#6 (1024KB) + L1 L#6 (16KB) + Core L#6 PU L#12 (P#3) PU L#13 (P#11) L2 L#7 (1024KB) + L1 L#7 (16KB) + Core L#7 PU L#14 (P#7) PU L#15 (P#15)

Note that there is also an equivalent output in XML that is meant for exporting /importing topologies but it is hardly readable to human-beings (see Importing and exporting topologies from/to XML files for details).

On a 4-package 2-core Opteron NUMA machine (with two core cores disallowed by the administrator), the lstopo tool may show the following graphical output (with --disallowed for displaying disallowed objects):

[hagrid]

Here's the equivalent output in textual form:

Machine (32GB total) Package L#0 NUMANode L#0 (P#0 8190MB) L2 L#0 (1024KB) + L1 L#0 (64KB) + Core L#0 + PU L#0 (P#0) L2 L#1 (1024KB) + L1 L#1 (64KB) + Core L#1 + PU L#1 (P#1) Package L#1 NUMANode L#1 (P#1 8192MB) L2 L#2 (1024KB) + L1 L#2 (64KB) + Core L#2 + PU L#2 (P#2) L2 L#3 (1024KB) + L1 L#3 (64KB) + Core L#3 + PU L#3 (P#3) Package L#2 NUMANode L#2 (P#2 8192MB) L2 L#4 (1024KB) + L1 L#4 (64KB) + Core L#4 + PU L#4 (P#4) L2 L#5 (1024KB) + L1 L#5 (64KB) + Core L#5 + PU L#5 (P#5) Package L#3 NUMANode L#3 (P#3 8192MB) L2 L#6 (1024KB) + L1 L#6 (64KB) + Core L#6 + PU L#6 (P#6) L2 L#7 (1024KB) + L1 L#7 (64KB) + Core L#7 + PU L#7 (P#7)

On a 2-package quad-core Xeon (pre-Nehalem, with 2 dual-core dies into each package):

[emmett]

Here's the same output in textual form:

Machine (total 16GB) NUMANode L#0 (P#0 16GB) Package L#0 L2 L#0 (4096KB) L1 L#0 (32KB) + Core L#0 + PU L#0 (P#0) L1 L#1 (32KB) + Core L#1 + PU L#1 (P#4) L2 L#1 (4096KB) L1 L#2 (32KB) + Core L#2 + PU L#2 (P#2) L1 L#3 (32KB) + Core L#3 + PU L#3 (P#6) Package L#1 L2 L#2 (4096KB) L1 L#4 (32KB) + Core L#4 + PU L#4 (P#1) L1 L#5 (32KB) + Core L#5 + PU L#5 (P#5) L2 L#3 (4096KB) L1 L#6 (32KB) + Core L#6 + PU L#6 (P#3) L1 L#7 (32KB) + Core L#7 + PU L#7 (P#7)

Programming Interface

The basic interface is available in hwloc.h. Some higher-level functions are available in hwloc/helper.h to reduce the need to manually manipulate objects and follow links between them. Documentation for all these is provided later in this document. Developers may also want to look at hwloc/inlines.h which contains the actual inline code of some hwloc.h routines, and at this document, which provides good higher-level topology traversal examples.

To precisely define the vocabulary used by hwloc, a Terms and Definitions section is available and should probably be read first.

Each hwloc object contains a cpuset describing the list of processing units that it contains. These bitmaps may be used for CPU binding and Memory binding. hwloc offers an extensive bitmap manipulation interface in hwloc/bitmap.h.

Moreover, hwloc also comes with additional helpers for interoperability with several commonly used environments. See the Interoperability With Other Software section for details.

The complete API documentation is available in a full set of HTML pages, man pages, and self-contained PDF files (formatted for both both US letter and A4 formats) in the source tarball in doc/doxygen-doc/.

NOTE: If you are building the documentation from a Git clone, you will need to have Doxygen and pdflatex installed -- the documentation will be built during the normal "make" process. The documentation is installed during "make install" to $prefix/share/doc/hwloc/ and your systems default man page tree (under $prefix, of course).

Portability

Operating System have varying support for CPU and memory binding, e.g. while some Operating Systems provide interfaces for all kinds of CPU and memory bindings, some others provide only interfaces for a limited number of kinds of CPU and memory binding, and some do not provide any binding interface at all. Hwloc's binding functions would then simply return the ENOSYS error (Function not implemented), meaning that the underlying Operating System does not provide any interface for them. CPU binding and Memory binding provide more information on which hwloc binding functions should be preferred because interfaces for them are usually available on the supported Operating Systems.

Similarly, the ability of reporting topology information varies from one platform to another. As shown in Command-line Examples, hwloc can obtain information on a wide variety of hardware topologies. However, some platforms and/or operating system versions will only report a subset of this information. For example, on an PPC64-based system with 8 cores (each with 2 hardware threads) running a default 2.6.18-based kernel from RHEL 5.4, hwloc is only able to glean information about NUMA nodes and processor units (PUs). No information about caches, packages, or cores is available.

Here's the graphical output from lstopo on this platform when Simultaneous Multi-Threading (SMT) is enabled:

[ppc64-with]

And here's the graphical output from lstopo on this platform when SMT is disabled:

[ppc64-with]

Notice that hwloc only sees half the PUs when SMT is disabled. PU L#6, for example, seems to change location from NUMA node #0 to #1. In reality, no PUs "moved" -- they were simply re-numbered when hwloc only saw half as many (see also Logical index in Indexes and Sets). Hence, PU L#6 in the SMT-disabled picture probably corresponds to PU L#12 in the SMT-enabled picture.

This same "PUs have disappeared" effect can be seen on other platforms -- even platforms / OSs that provide much more information than the above PPC64 system. This is an unfortunate side-effect of how operating systems report informat

Related Skills

View on GitHub
GitHub Stars684
CategoryDevelopment
Updated4h ago
Forks206

Languages

C

Security Score

85/100

Audited on Mar 30, 2026

No findings