Nsimd
Agenium Scale vectorization library for CPUs and GPUs
Install / Use
/learn @agenium-scale/NsimdREADME
Documentation can be found here. We put a lot of effort into testing.
What is NSIMD?
At its core, NSIMD is a vectorization library that abstracts SIMD programming. It was designed to exploit the maximum power of processors at a low development cost. NSIMD comes with modules. As of now two of them adds support for GPUs to NSIMD. The direction that NSIMD is taking is to provide several programming paradigms to address different problems and to allow a wider support of architectures. With two of its modules NSIMD provides three programming paradigms:
- Imperative programming provided by NSIMD core that supports a lots of CPU/SIMD extensions.
- Expressions templates provided by the TET1D module that supports all architectures from NSIMD core and adds support for NVIDIA and AMD GPUs.
- Single Program Multiple Data provided by the SPMD module that supports all architectures from NSIMD core and adds support for NVIDIA and AMD GPUs.
Supported architectures
| Architecture | NSIMD core | TET1D module | SPMD module | |:--------------------------------------|:----------:|:------------:|:-----------:| | CPU (scalar functions) | Y | Y | Y | | CPU (128-bits SIMD emulation) | Y | Y | Y | | Intel SSE 2 | Y | Y | Y | | Intel SSE 4.2 | Y | Y | Y | | Intel AVX | Y | Y | Y | | Intel AVX2 | Y | Y | Y | | Intel AVX-512 for KNLs | Y | Y | Y | | Intel AVX-512 for Skylake processors | Y | Y | Y | | Arm NEON 128 bits (ARMv7 and earlier) | Y | Y | Y | | Arm NEON 128 bits (ARMv8 and later) | Y | Y | Y | | Arm SVE (original sizeless SVE) | Y | Y | Y | | Arm fixed sized SVE | Y | Y | Y | | IBM POWERPC VMX | Y | Y | Y | | IBM POWERPC VSX | Y | Y | Y | | NVIDIA CUDA | N | Y | Y | | AMD ROCm | N | Y | Y | | Intel oneAPI | N | Y | Y |
Contributions
| Contributor | Contribution(s) | |:---------------------|:--------------------------------------------------| | Guillaume Quintin | Maintainer + main contributor | | Alan Kelly | Arm NEON + mathematical functions | | Kenny Péou | Fixed point module | | Xavier Berault | PowerPC VMX and VSX | | Vianney Stricher | NSIMD core + oneAPI in SPMD and TET1D modules | | Quentin Khan | Soa/AoS loads and stores | | Paul Gannay | PowerPC VMX, VSX + testing system | | Charly Chevalier | Benchmarking system + Python internals | | Erik Schnetter | Fixes + code generation | | Lénaïc Bagnères | Fixes + TET1D module | | Jean-Didier Pailleux | Shuffles operators |
How it works?
To achieve maximum performance, NSIMD mainly relies on the inline optimization pass of the compiler. Therefore using any mainstream compiler such as GCC, Clang, MSVC, XL C/C++, ICC and others with NSIMD will give you a zero-cost SIMD abstraction library.
To allow inlining, a lot of code is placed in header files. Small functions
such as addition, multiplication, square root, etc, are all present in header
files whereas big functions such as I/O are put in source files that are
compiled as a .so/.dll library.
NSIMD provides C89, C11, C++98, C++11, C++14 and C++20 APIs. All APIs allow
writing generic code. For the C API this is achieved through a thin layer of
macros and with the _Generic keyword for the C advanced API; for the C++ APIs
it is achieved using templates and function overloading. The C++ APIs are split
into two. The first part is a C-like API with only function calls and direct
type definitions for SIMD types while the second one provides operator
overloading, higher level type definitions that allows unrolling. C++11, C++14
APIs add for instance templated type definitions and templated constants while
the C++20 API uses concepts for better error reporting.
Binary compatibility is guaranteed by the fact that only a C ABI is exposed. The C++ API only wraps the C calls.
Supported compilers
NSIMD is tested with GCC, Clang, MSVC, NVCC, HIPCC and ARMClang. As a C89 and a C++98 API are provided, other compilers should work fine. Old compiler versions should work as long as they support the targeted SIMD extension. For instance, NSIMD can compile SSE 4.2 code with MSVC 2010.
Build the library
CMake
As CMake is widely used as a build system, we have added support for building the library only and the corresponding find module.
mkdir build
cd build
cmake .. -Dsimd=SIMD_EXT
make
make install
where SIMD_EXT is one of the following: CPU, SSE2, SSE42, AVX, AVX2,
AVX512_KNL, AVX512_SKYLAKE, NEON128, AARCH64, SVE, SVE128, SVE256, SVE512,
SVE1024, SVE2048, VMX, VSX, CUDA, ROCM.
Note that when compiling for NEON128 on Linux one has to choose the ABI, either armel or armhf. Default is armel. As CMake is unable to autodetect this parameter one has to tell CMake manually.
cmake .. -Dsimd=neon128 # for armel
cmake .. -Dsimd=neon128 -DNSIMD_ARM32_IS_ARMEL=OFF # for armhf
We provide in the scripts directory a CMake find module to find NSIMD on
your system. One can let the module find NSIMD on its own, if several
versions for different SIMD extensions of NSIMD are installed then the
module will find and return one. There is no guaranty on which versions will
be chosen by the module.
find_package(NSIMD)
If one wants a specific version of the library for a given SIMD extension then
use the COMPONENTS part of find_package. Only one component is supported
at a time.
find_package(NSIMD COMPONENTS avx2) # find only NSIMD for Intel AVX2
find_package(NSIMD COMPONENTS sve) # find only NSIMD for Arm SVE
find_package(NSIMD COMPONENTS sse2 sse42) # unsupported
Nsconfig
The support for CMake has been limited to building the library only. If you wish to run tests or contribute you need to use nsconfig as CMake has several flaws:
- too slow especially on Windows,
- inability to use several compilers at once,
- inability to have a portable build system,
- very poor support for portable compilation flags,
- ...
Dependencies (nsconfig only)
Generating C/C++ files is done by the Python3 code contained in the egg.
Python should be installed by default on any Linux distro. On Windows it comes
with the latest versions of Visual Studio on Windows
(https://visualstudio.microsoft.com/vs/community/), you can also download and
install it directly from https://www.python.org/.
The Python code can call clang-format to properly format all generated C/C++
source. On Linux you can install it via your package manager. On Windows you
can use the official binary at https://llvm.org/builds/.
Compiling the library requires a C++98 compiler. Any version of GCC, Clang or MSVC will do. Note that the produced library and header files for the end-user are C89, C++98, C++11 compatible. Note that C/C++ files are generated by a bunch of Python scripts and they must be executed first before running building the library.
Build for Linux
bash scripts/build.sh for simd_ext1/.../simd_extN with comp1/.../compN
For each combination a directory build-simd_ext-comp will be created and
will contain the library. Supported SIMD extension are:
- sse2
- sse42
- avx
- avx2
- avx512_knl
- avx512_skylake
- neon128
- aarch64
- sve
- sve128
- sve256
- sve512
- sve1024
- sve2048
- vmx
- vsx
- cuda
- rocm
Supported compiler are:
- gcc
- clang
- icc
- armclang
- xlc
- dpcpp
- fcc
- cl
- nvcc
- hipcc
Note that certain combination of SIMD extension/compilers are not supported such as aarch64 with icc, or avx512_skylake with nvcc.
Build on Windows
Make sure you are typing in a Visual Studio prompt. The command is almost the same as for Linux with the same constraints on the pairs SIMD extension/compilers.
scripts\build.bat for simd_ext1/.../simd_extN with comp1/.../compN
More details on building the library
The library uses a tool called nsconfig
(https://github.com/agenium-scale/nstools) which is basically a Makefile
translator. If you have just built NSIMD following what's described above
you should have a nstools directory which contains bin/nsconfig. If not
you can generate it using on Linux
bash scripts/setup.sh
and on Windows
scripts\setup.bat
Then you can use nsconfig directly it has a syntax similar to CMake at
command line. Here is a quick tutorial with Linux command line. We first
go to the NSIMD directory and generate both NSIMD and nsconfig.
$ cd nsimd
$ python3 egg/hatch.py -ltf
$ bash scripts/setup.sh
$ mkdir build
$ cd build
Help can be displayed using --help.
$ ../nstools/bin/nsconfig --help
usage: nsconfig [OPTIONS]... DIRECTORY
Configure project for compilation.
-v verbose mode, useful for debugging
Related Skills
node-connect
338.7kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
83.6kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
338.7kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
commit-push-pr
83.6kCommit, push, and open a PR
