MIPP
Portable wrapper for SIMD and vector instructions written in C++11. Compatible with NEON, SSE, AVX, AVX-512 and SVE (length specific).
Install / Use
/learn @aff3ct/MIPPREADME
MyIntrinsics++ (MIPP)

Purpose
MIPP is a portable and Open-source wrapper (MIT license) for vector intrinsic functions (SIMD) written in C++11. It works for SSE, AVX, AVX-512, ARM NEON and SVE (work in progress) instructions. MIPP wrapper supports simple/double precision floating-point numbers and also signed/unsigned integer arithmetic (64-bit, 32-bit, 16-bit and 8-bit).
With the MIPP wrapper you do not need to write a specific intrinsic code anymore. Just use provided functions and the wrapper will automatically generates the right intrisic calls for your specific architecture.
If you are interested by ARM SVE development status, please follow this link.
Short Documentation
Supported Compilers
At this time, MIPP has been tested on the following compilers:
- Intel:
icpc>=16, - GNU:
g++>=4.8, - Clang:
clang++>=3.6, - Microsoft:
msvc>=14.
On msvc 14.10 (Microsoft Visual Studio 2017), the performances are reduced
compared to the other compilers, the compiler is not able to fully inline all
the MIPP methods. This has been fixed on msvc 14.21 (Microsoft Visual Studio
2019) and now you can expect high performances.
Install and Configure your Code
You don't have to install MIPP because it is a simple C++ header file. The
headers are located in the include folder (note that this location has changed
since commit 6795891, before they were located in the src folder).
Just include the header into your source files when the wrapper is needed.
#include "mipp.h"
mipp.h use a C++ namespace: mipp, if you do not want to prefix all the MIPP
calls by mipp:: you can do that:
#include "mipp.h"
using namespace mipp;
Before trying to compile, think to tell the compiler what kind of vector
instructions you want to use. For instance, if you are using GNU compiler
(g++) you simply have to add the -march=native option for SSE and AVX CPUs
compatible. For ARMv7 CPUs with NEON instructions you have to add the
-mfpu=neon option (since most of current NEONv1 instructions are not IEEE-754
compliant). However, this is no more the case on ARMv8 processors, so the
-march=native option will work too. MIPP also uses some nice features provided
by the C++11 and so we have to add the -std=c++11 flag to compile the code.
You are now ready to run your code with the MIPP wrapper.
In the case where MIPP is installed on the system it can be integrated into a cmake projet in a standard way. Example
# install MIPP
cd MIPP/
export MIPP_ROOT=$PWD/build/install
cmake -B build -DCMAKE_INSTALL_PREFIX=$MIPP_ROOT
cmake --build build -j5
cmake --install build
In your CMakeLists.txt:
# find the installation of MIPP on the system
find_package(MIPP REQUIRED)
# define your executable
add_executable(gemm gemm.cpp)
# link your executable to MIPP
target_link_libraries(gemm PRIVATE MIPP::mipp)
cd your_project/
# if MIPP is installed in a system standard path: MIPP will be found automatically with cmake
cmake -B build
# if MIPP is installed in a non-standard path: use CMAKE_PREFIX_PATH
cmake -B build -DCMAKE_PREFIX_PATH=$MIPP_ROOT
Generate Sources & Compile the Static Library
MIPP is mainly a header only library. However, some macro operations require
to compile a small library. This is particularly true for the compress
operation that relies on generated LUTs stored in the static library.
To generate the source files containing these LUTs you need to install Python3 with the Jinja2 package:
sudo apt install python3 python3-pip
pip3 install --user -r codegen/requirements.txt
Then you can call the generator as follow:
python3 codegen/gen_compress.py
And, finally you can compile the MIPP static library:
cmake -B build -DMIPP_STATIC_LIB=ON
cmake --build build -j4
Note that the compilation of the static library is optional. You can choose to do not compile the static library then only some macro operations will be missing.
Sequential Mode
By default, MIPP tries to recognize the instruction set from the preprocessor definitions. If MIPP can't match the instruction set (for instance when MIPP does not support the targeted instruction set), MIPP falls back on standard sequential instructions. In this mode, the vectorization is not guarantee anymore but the compiler can still perform auto-vectorization.
It is possible to force MIPP to use the sequential mode with the following
compiler definition: -DMIPP_NO_INTRINSICS. Sometime it can be useful for
debugging or to bench a code.
If you want to check the MIPP mode configuration, you can print the following
global variable: mipp::InstructionFullType (std::string).
Vector Register Declaration
Just use the mipp::Reg<T> type.
mipp::Reg<T> r1, r2, r3; // we have declared 3 vector registers
But we do not know the number of elements per register here. This number of
elements can be obtained by calling the mipp::N<T>() function (T is a
template parameter, it can be double, float, int64_t, uint64_t,
int32_t, uint32_t, int16_t, uint16_t, int8_t or uint8_t type).
for (int i = 0; i < n; i += mipp::N<float>()) {
// ...
}
The register size directly depends on the precision of the data we are working on.
Register load and store Instructions
Loading memory from a vector into a register:
int n = mipp::N<float>() * 10;
std::vector<float> myVector(n);
int i = 0;
mipp::Reg<float> r1;
r1.load(&myVector[i*mipp::N<float>()]);
The last two lines can be shorten as follow where the load call becomes
implicit:
mipp::Reg<float> r1 = &myVector[i*mipp::N<float>()];
Store can be done with the store(...) method:
int n = mipp::N<float>() * 10;
std::vector<float> myVector(n);
int i = 0;
mipp::Reg<float> r1 = &myVector[i*mipp::N<float>()];
// do something with r1
r1.store(&myVector[(i+1)*mipp::N<float>()]);
By default the loads and stores work on unaligned memory.
It is possible to control this behavior with the -DMIPP_ALIGNED_LOADS
definition: when specified, the loads and stores work on aligned memory by
default. In the aligned memory mode, it is still possible to perform
unaligned memory operations with the mipp::loadu and mipp::storeu functions.
However, it is not possible to perform aligned loads and stores in the
unaligned memory mode.
To allocate aligned data you can use the MIPP aligned memory allocator wrapped
into the mipp::vector class. mipp::vector is fully retro-compatible with the
standard std::vector class and it can be use everywhere you can use
std::vector.
mipp::vector<float> myVector(n);
Register Initialization
You can initialize a vector register from a scalar value:
mipp::Reg<float> r1; // r1 = | unknown | unknown | unknown | unknown |
r1 = 1.0; // r1 = | +1.0 | +1.0 | +1.0 | +1.0 |
Or from an initializer list (std::initializer_list):
mipp::Reg<float> r1; // r1 = | unknown | unknown | unknown | unknown |
r1 = {1.0, 2.0, 3.0, 4.0}; // r1 = | +1.0 | +2.0 | +3.0 | +4.0 |
Computational Instructions
Add two vector registers:
mipp::Reg<float> r1, r2, r3;
r1 = 1.0; // r1 = | +1.0 | +1.0 | +1.0 | +1.0 |
r2 = 2.0; // r2 = | +2.0 | +2.0 | +2.0 | +2.0 |
r3 = r1 + r2; // r3 = | +3.0 | +3.0 | +3.0 | +3.0 |
Subtract two vector registers:
mipp::Reg<float> r1, r2, r3;
r1 = 1.0; // r1 = | +1.0 | +1.0 | +1.0 | +1.0 |
r2 = 2.0; // r2 = | +2.0 | +2.0 | +2.0 | +2.0 |
r3 = r1 - r2; // r3 = | -1.0 | -1.0 | -1.0 | -1.0 |
Multiply two vector registers:
mipp::Reg<float> r1, r2, r3;
r1 = 1.0; // r1 = | +1.0 | +1.0 | +1.0 | +1.0 |
r2 = 2.0; // r2 = | +2.0 | +2.0 | +2.0 | +2.0 |
r3 = r1 * r2; // r3 = | +2.0 | +2.0 | +2.0 | +2.0 |
Divide two vector registers:
mipp::Reg<float> r1, r2, r3;
r1 = 1.0; // r1 = | +1.0 | +1.0 | +1.0 | +1.0 |
r2 = 2.0; // r2 = | +2.0 | +2.0 | +2.0 | +2.0 |
r3 = r1 / r2; // r3 = | +0.5 | +0.5 | +0.5 | +0.5 |
Fused multiply and add of three vector registers:
mipp::Reg<float> r1, r2, r3, r4;
r1 = 2.0; // r1 = | +2.0 | +2.0 | +2.0 | +2.0 |
r2 = 3.0; // r2 = | +3.0 | +3.0 | +3.0 | +3.0 |
r3 = 1.0; // r3 = | +1.0 | +1.0 | +1.0 | +1.0 |
// r4 = (r1 * r2) + r3
r4 = mipp::fmadd(r1, r2, r3); // r4 = | +7.0 | +7.0 | +7.0 | +7.0 |
Fused negative multiply and add of three vector registers:
mipp::Reg<float> r1, r2, r3, r4;
r1 = 2.0; // r1 = | +2.0 | +2.0 | +2.0 | +2.0 |
r2 = 3.0; // r2 = | +3.0 | +3.0 | +3.0 | +3.0 |
r3 = 1.0; // r3 = | +1.0 | +1.0 | +1.0 | +1.0 |
// r4 = -(r1 * r2) + r3
r4 = mipp::fnmadd(r1, r2, r3); // r4 = | -5.0 | -5.0 | -5.0 | -5.0 |
Square root of a vector register:
mipp::Reg<float> r1, r2;
r1 = 9.0; // r1 = | +9.0 | +9.0 | +9.0 | +9.0 |
r2 = mipp::sqrt(r1); // r2 = | +3.0 | +3.0 | +3.0 | +3.0 |
Reciprocal square root of a vector register (be careful: this intrinsic exists only for simple precision floating-point numbers):
mipp::Reg<float> r1, r2;
r1 = 9.0; // r1 = | +9.0 | +9.0 | +9.0 | +9.0 |
r2 = mipp::rsqrt(r1); // r2 = | +0.3 | +0.3 | +0.3 | +0.3 |
Selections
Select the minimum between two vector registers:
mipp::Reg<float> r1, r2, r3;
r1 = 2.0; // r1 = | +2.0 | +2.0 | +2.0 | +2.0 |
r2 = 3.0; // r2 = | +3.0 | +3.0 | +3.0
