SkillAgentSearch skills...

Nsimd

Agenium Scale vectorization library for CPUs and GPUs

Install / Use

/learn @agenium-scale/Nsimd

README

Documentation can be found here. We put a lot of effort into testing.

What is NSIMD?

At its core, NSIMD is a vectorization library that abstracts SIMD programming. It was designed to exploit the maximum power of processors at a low development cost. NSIMD comes with modules. As of now two of them adds support for GPUs to NSIMD. The direction that NSIMD is taking is to provide several programming paradigms to address different problems and to allow a wider support of architectures. With two of its modules NSIMD provides three programming paradigms:

  • Imperative programming provided by NSIMD core that supports a lots of CPU/SIMD extensions.
  • Expressions templates provided by the TET1D module that supports all architectures from NSIMD core and adds support for NVIDIA and AMD GPUs.
  • Single Program Multiple Data provided by the SPMD module that supports all architectures from NSIMD core and adds support for NVIDIA and AMD GPUs.

Supported architectures

| Architecture | NSIMD core | TET1D module | SPMD module | |:--------------------------------------|:----------:|:------------:|:-----------:| | CPU (scalar functions) | Y | Y | Y | | CPU (128-bits SIMD emulation) | Y | Y | Y | | Intel SSE 2 | Y | Y | Y | | Intel SSE 4.2 | Y | Y | Y | | Intel AVX | Y | Y | Y | | Intel AVX2 | Y | Y | Y | | Intel AVX-512 for KNLs | Y | Y | Y | | Intel AVX-512 for Skylake processors | Y | Y | Y | | Arm NEON 128 bits (ARMv7 and earlier) | Y | Y | Y | | Arm NEON 128 bits (ARMv8 and later) | Y | Y | Y | | Arm SVE (original sizeless SVE) | Y | Y | Y | | Arm fixed sized SVE | Y | Y | Y | | IBM POWERPC VMX | Y | Y | Y | | IBM POWERPC VSX | Y | Y | Y | | NVIDIA CUDA | N | Y | Y | | AMD ROCm | N | Y | Y | | Intel oneAPI | N | Y | Y |

Contributions

| Contributor | Contribution(s) | |:---------------------|:--------------------------------------------------| | Guillaume Quintin | Maintainer + main contributor | | Alan Kelly | Arm NEON + mathematical functions | | Kenny Péou | Fixed point module | | Xavier Berault | PowerPC VMX and VSX | | Vianney Stricher | NSIMD core + oneAPI in SPMD and TET1D modules | | Quentin Khan | Soa/AoS loads and stores | | Paul Gannay | PowerPC VMX, VSX + testing system | | Charly Chevalier | Benchmarking system + Python internals | | Erik Schnetter | Fixes + code generation | | Lénaïc Bagnères | Fixes + TET1D module | | Jean-Didier Pailleux | Shuffles operators |

How it works?

To achieve maximum performance, NSIMD mainly relies on the inline optimization pass of the compiler. Therefore using any mainstream compiler such as GCC, Clang, MSVC, XL C/C++, ICC and others with NSIMD will give you a zero-cost SIMD abstraction library.

To allow inlining, a lot of code is placed in header files. Small functions such as addition, multiplication, square root, etc, are all present in header files whereas big functions such as I/O are put in source files that are compiled as a .so/.dll library.

NSIMD provides C89, C11, C++98, C++11, C++14 and C++20 APIs. All APIs allow writing generic code. For the C API this is achieved through a thin layer of macros and with the _Generic keyword for the C advanced API; for the C++ APIs it is achieved using templates and function overloading. The C++ APIs are split into two. The first part is a C-like API with only function calls and direct type definitions for SIMD types while the second one provides operator overloading, higher level type definitions that allows unrolling. C++11, C++14 APIs add for instance templated type definitions and templated constants while the C++20 API uses concepts for better error reporting.

Binary compatibility is guaranteed by the fact that only a C ABI is exposed. The C++ API only wraps the C calls.

Supported compilers

NSIMD is tested with GCC, Clang, MSVC, NVCC, HIPCC and ARMClang. As a C89 and a C++98 API are provided, other compilers should work fine. Old compiler versions should work as long as they support the targeted SIMD extension. For instance, NSIMD can compile SSE 4.2 code with MSVC 2010.

Build the library

CMake

As CMake is widely used as a build system, we have added support for building the library only and the corresponding find module.

mkdir build
cd build
cmake .. -Dsimd=SIMD_EXT
make
make install

where SIMD_EXT is one of the following: CPU, SSE2, SSE42, AVX, AVX2, AVX512_KNL, AVX512_SKYLAKE, NEON128, AARCH64, SVE, SVE128, SVE256, SVE512, SVE1024, SVE2048, VMX, VSX, CUDA, ROCM.

Note that when compiling for NEON128 on Linux one has to choose the ABI, either armel or armhf. Default is armel. As CMake is unable to autodetect this parameter one has to tell CMake manually.

cmake .. -Dsimd=neon128                               # for armel
cmake .. -Dsimd=neon128 -DNSIMD_ARM32_IS_ARMEL=OFF    # for armhf

We provide in the scripts directory a CMake find module to find NSIMD on your system. One can let the module find NSIMD on its own, if several versions for different SIMD extensions of NSIMD are installed then the module will find and return one. There is no guaranty on which versions will be chosen by the module.

find_package(NSIMD)

If one wants a specific version of the library for a given SIMD extension then use the COMPONENTS part of find_package. Only one component is supported at a time.

find_package(NSIMD COMPONENTS avx2)         # find only NSIMD for Intel AVX2
find_package(NSIMD COMPONENTS sve)          # find only NSIMD for Arm SVE
find_package(NSIMD COMPONENTS sse2 sse42)   # unsupported

Nsconfig

The support for CMake has been limited to building the library only. If you wish to run tests or contribute you need to use nsconfig as CMake has several flaws:

  • too slow especially on Windows,
  • inability to use several compilers at once,
  • inability to have a portable build system,
  • very poor support for portable compilation flags,
  • ...

Dependencies (nsconfig only)

Generating C/C++ files is done by the Python3 code contained in the egg. Python should be installed by default on any Linux distro. On Windows it comes with the latest versions of Visual Studio on Windows (https://visualstudio.microsoft.com/vs/community/), you can also download and install it directly from https://www.python.org/.

The Python code can call clang-format to properly format all generated C/C++ source. On Linux you can install it via your package manager. On Windows you can use the official binary at https://llvm.org/builds/.

Compiling the library requires a C++98 compiler. Any version of GCC, Clang or MSVC will do. Note that the produced library and header files for the end-user are C89, C++98, C++11 compatible. Note that C/C++ files are generated by a bunch of Python scripts and they must be executed first before running building the library.

Build for Linux

bash scripts/build.sh for simd_ext1/.../simd_extN with comp1/.../compN

For each combination a directory build-simd_ext-comp will be created and will contain the library. Supported SIMD extension are:

  • sse2
  • sse42
  • avx
  • avx2
  • avx512_knl
  • avx512_skylake
  • neon128
  • aarch64
  • sve
  • sve128
  • sve256
  • sve512
  • sve1024
  • sve2048
  • vmx
  • vsx
  • cuda
  • rocm

Supported compiler are:

  • gcc
  • clang
  • icc
  • armclang
  • xlc
  • dpcpp
  • fcc
  • cl
  • nvcc
  • hipcc

Note that certain combination of SIMD extension/compilers are not supported such as aarch64 with icc, or avx512_skylake with nvcc.

Build on Windows

Make sure you are typing in a Visual Studio prompt. The command is almost the same as for Linux with the same constraints on the pairs SIMD extension/compilers.

scripts\build.bat for simd_ext1/.../simd_extN with comp1/.../compN

More details on building the library

The library uses a tool called nsconfig (https://github.com/agenium-scale/nstools) which is basically a Makefile translator. If you have just built NSIMD following what's described above you should have a nstools directory which contains bin/nsconfig. If not you can generate it using on Linux

bash scripts/setup.sh

and on Windows

scripts\setup.bat

Then you can use nsconfig directly it has a syntax similar to CMake at command line. Here is a quick tutorial with Linux command line. We first go to the NSIMD directory and generate both NSIMD and nsconfig.

$ cd nsimd
$ python3 egg/hatch.py -ltf
$ bash scripts/setup.sh
$ mkdir build
$ cd build

Help can be displayed using --help.

$ ../nstools/bin/nsconfig --help
usage: nsconfig [OPTIONS]... DIRECTORY
Configure project for compilation.

  -v              verbose mode, useful for debugging

Related Skills

View on GitHub
GitHub Stars338
CategoryDevelopment
Updated15h ago
Forks31

Languages

C

Security Score

100/100

Audited on Mar 27, 2026

No findings