Libxsmm
Library for specialized dense and sparse matrix operations, and deep learning primitives.
Install / Use
/learn @libxsmm/LibxsmmREADME
LIBXSMM
LIBXSMM is a library for specialized dense and sparse matrix operations as well as for deep learning primitives such as small convolutions. The library is targeting Intel Architecture with <span>Intel SSE</span>, <span>Intel AVX</span>, <span>Intel AVX2</span>, <span>Intel AVX‑512</span> (with VNNI and Bfloat16), and <span>Intel AMX</span> (Advanced Matrix Extensions) supported by future Intel processor code-named Sapphire Rapids. Code generation is mainly based on <span>Just‑In‑Time (JIT)</span> code specialization for compiler-independent performance (matrix multiplications, matrix transpose/copy, sparse functionality, and deep learning). LIBXSMM is suitable for "build once and deploy everywhere", i.e., no special target flags are needed to exploit the available performance. Supported GEMM datatypes are: FP64, FP32, bfloat16, int16, and int8.
For a list questions and answers, please also have a look at https://github.com/libxsmm/libxsmm/wiki/Q&A.
Where to go for documentation?
- ReadtheDocs: main and sample documentation with full text search.
- PDF: main documentation file, and separate sample documentation.
- Articles: magazine article incl. sample code (full list of Articles).
<a name="getting-started"></a><a name="hello-libxsmm"></a>Getting Started: The following C++ code is focused on a specific functionality but may be considered as Hello LIBXSMM. Build the example with cd /path/to/libxsmm; make STATIC=0 (shared library), save the code under hello.cpp (below) and compile with g++ -I/path/to/libxsmm/include hello.cpp -L/path/to/libxsmm/lib -lxsmm -lblas -o hello (GNU CCC), and finally execute with LD_LIBRARY_PATH=/path/to/libxsmm/lib LIBXSMM_VERBOSE=2 ./hello.
#include <libxsmm.h>
#include <vector>
int main(int argc, char* argv[]) {
typedef double T;
int batchsize = 1000, m = 13, n = 5, k = 7;
std::vector<T> a(batchsize * m * k), b(batchsize * k * n), c(m * n, 0);
/* C/C++ and Fortran interfaces are available */
typedef libxsmm_mmfunction<T> kernel_type;
/* generates and dispatches a matrix multiplication kernel (C++ functor) */
kernel_type kernel(LIBXSMM_GEMM_FLAG_NONE, m, n, k, 1.0 /*alpha*/, 1.0 /*beta*/);
assert(kernel);
for (int i = 0; i < batchsize; ++i) { /* initialize input */
for (int ki = 0; ki < k; ++ki) {
for (int j = 0; j < m; ++j) a[i * j * ki] = static_cast<T>(1) / ((i + j + ki) % 25);
for (int j = 0; j < n; ++j) b[i * j * ki] = static_cast<T>(7) / ((i + j + ki) % 75);
}
}
/* kernel multiplies and accumulates matrices: C += Ai * Bi */
for (int i = 0; i < batchsize; ++i) kernel(&a[i * m * k], &b[i * k * n], &c[0]);
}
Plain C code as well as Fortran code resemble the same example.
<a name="what-is-a-small-matrix-multiplication"></a>What is a small matrix multiplication? When characterizing the problem-size by using the M, N, and K parameters, a problem-size suitable for LIBXSMM falls approximately within <i>(M N K)<sup>1/3</sup> <= 64</i> (which illustrates that non-square matrices or even "tall and skinny" shapes are covered as well). The library does not employ multiplevel K,M,N blocking. Using LIBXSMM for much larger sizes may generate excessive amounts of code (due to unrolling in M or K dimension), but also misses to implement a tiling scheme to effectively utilize the cache hierarchy. In terms of GEMM, the supported kernels are limited to Alpha := 1, Beta := { 1, 0 }, and TransA := 'N'.
Interfaces and Domains<a name="interfaces"></a>
Overview<a name="general-interface"></a>
Please have a look at https://github.com/libxsmm/libxsmm/tree/main/include for all published functions. Get started with the following list of available domains and documented functionality:
- MM: Matrix Multiplication
- TPP: Tensor Processing Primitives
- DNN: Deep Neural Networks
- AUX: Service Functions
- PERF: Performance
- BE: Backend
To initialize library internal resources, an explicit initialization routine helps to avoid lazy initialization overhead when calling LIBXSMM for the first time. The library deallocates internal resources at program exit, but also provides a companion of the afore mentioned initialization (finalize).
/** Initialize the library; pay for setup cost at a specific point. */
void libxsmm_init(void);
/** De-initialize the library and free internal memory (optional). */
void libxsmm_finalize(void);
Matrix Multiplication<a name="interface-for-matrix-multiplication"></a>
This domain (MM) supports Small Matrix Multiplications (SMM), batches of multiple multiplications as well as the industry-standard interface for GEneral Matrix Matrix multiplication (GEMM).
The Matrix Multiplication domain (MM) contains routines for:
- Small, tiled, and parallelized matrix multiplications
- Manual code dispatch (customized matrix batches)
Deep Learning<a name="interface-for-dl"></a>
Here we demonstrate how common operators in deep learning applications (GEMM with activation function fusion, Convolutions with activation function fusion, various norming operators, and pooling operators, etc.) can be implemented using the Tensor Processing Primitive provided by LIBXSMM. Example drivers for performance evaluation are provided as part of LIBXSMM_DNN.
Service Functions
For convenient operation of the library and to ease integration, some service routines are available. These routines may not belong to the core functionality of LIBXSMM (SMM or DNN domain), but users are encouraged to use this domain (AUX). There are two categories: <span>(1) routines</span> which are available for C and FORTRAN, and <span>(2) routines</span> that are only available per C interface.
The service function domain (AUX) contains routines for:
- Getting and setting the target architecture
- Getting and setting the verbosity
- Measuring time durations (timer)
- Dispatching user-data and multiple kernels
- Allocating memory
Backend<a name="jit-backend"></a>
More information about the JIT-backend and the code generator can be found in a separate document. The encoder sample collection can help to get started writing a kernel using LIBXSMM. Please note, LIBXSMM's stand-alone <a name="generator-driver"></a>generator-driver is considered legacy (deprecated).
Build Instructions
Overview
The main interface file is generated, and it is therefore not stored in the code repository. To inspect the interface for C/C++ and FORTRAN, one can take a look at the template files used to generate the actual interface. There are two general ways to build and use LIBXSMM:
- Classic Library (ABI) and Link Instructions (C/C++ and FORTRAN)
- Header-Only (C and C++)
Note: LIBXSMM is available as prebuilt package for Fedora/RedHat/CentOS, Debian/Ubuntu, FreeBSD, and others. Further, LIBXSMM can be installed with the Spack Package Manager or
Related Skills
vue-3d-experience-skill
A comprehensive learning roadmap for mastering 3D Creative Development using Vue 3, Nuxt, and TresJS.
next
A beautifully designed, floating Pomodoro timer that respects your workspace.
roadmap
A beautifully designed, floating Pomodoro timer that respects your workspace.
progress
A beautifully designed, floating Pomodoro timer that respects your workspace.
