zimt - C++11 template library to process n-dimensional arrays with multi-threaded SIMD code

Zimt is the German word for cinnamon, and it sounds vaguely like SIMD. It should imply something spicy and yummy, and apart from the silly pun there is no deeper meaning.

The code is still evolving, but approaching a first release. Because it's in large parts from an extant library which has seen years of development, the original code was already at a reasonably good level, but I made quite a few design changes.

The code in this library is based on code from my library vspline. I found that the tools I developed there to process n-dimensional arrays with multi-threaded SIMD code would be useful outside of a b-spline library's context, and that I might be better off letting go of the dependency on the vigra library, which I use in vspline for data handling and small aggregates.

Initially I intended to modify vspline to use zimt, but now I've decided to go the other way. As zimt evolved, I ported more and more of the vspline code base to zimt, and with the last set of ported capabilities - digital filters and b-spline processing - zimt can now do everything which vspline can, but it does so without relying on vigra, using zimt data types for arrays and small aggregates. Since the zimt data types are quite close to vigra's corresponding types, the transition was not too hard, but there are some differences where I decided to do things differently.

Installation

Installation is simple: soft-link the folder containing the zimt headers to a location which the compiler will find, so that includes à la "#include <zimt/zimt.h>" will succeed. On my Linux system, I use:

sudo ln -s path-to-source/zimt/zimt /usr/local/include

Where "path-to-source" is the parent folder of the clone of the zimt repo. With this 'installation' the examples will compile out-of-the-box. If you don't like linking stuff into your /usr/local, you'll have to tell the compiler where the zimt headers are, e.g. with a -I command line argument.

The top-level header is called "zimt.h", it includes all of zimt except the code for using tiled storage.

To help with the examples, the 'examples' folder has a shell script 'examples.sh' which will compile all C++ files you pass to it with clang++ and g++, and with all available SIMD backends. Of course using all available backends will require that you have them installed as well - if you are missing some, make a copy of the shell script, comment out the parts in question, and use the copy instead of examples.sh. If you don't have any of the SIMD backends, zimt will provide it's own, which does a fairly good job but is usually not as fast as the other backends. But you can go ahead and start coding with zimt, and if you need more speed you can install one of the back-ends later.

Note that some of the examples have a .cc extension and some a .cpp extension. The files with the .cpp extension won't work with examples.sh, they require specific compilation (as described inside the code) because they demonstrate specific features needing 'special treatment'. The .cc files should all work with examples.sh, and a good test to see if everything is set up properly is to compile all examples and 'give them a spin':

cd examples
./examples.sh *.cc
for f in *++; do echo $f; ./$f >> /dev/null; done

zimt in action

I have created a branch 'zimt' in my project lux which uses zimt to implement some functionality of the vspline code base which is packaged with lux. This 'exercises' the zimt data types and transform family of functions in a real application, and it's helped me to do some debugging and tweaking, so that the 'zimt flavour' for lux now runs roughly as fast as the original. If you're interested to see zimt 'in action', check out lux' zimt branch and build as usual. 'to see' is meant quite literally: with an image processing software, if anything goes wrong, you're likely to see it.

zimt backends

On top of zimt's own 'goading' back-end, zimt currently supports three more SIMD backends:

highway
Vc
std::simd

My recommendation is to use highway - it's the most recent development, supports a wide variety of hardware, and the integration into zimt is well-used because it's now the default back-end for lux. highway is available from github, and it's also quite available via package managers, so it's a good idea to see if your package manager offers it. To use this backend with zimt, #define USE_HWY

Note that to get specific code for a given ISA with the highway back-end, you may need additional compiler flags beyond '-march=native' - please refer to highway's documentation, or have a look at lux' CMakeLists.txt, where these flags are used for the ISAs lux supports.

zimt can now exploit highway's code to detect the best supported SIMD ISA for the CPU on which the code is executed and dispatch to it. This requires a little extra coding effort, but that's well worth it. If you want your program to use this mechanism, you need to #define MULTI_SIMD_ISA and follow the pattern in the examples. This will work for the highway and goading back-ends. If you use this method, you don't have to pass SIMD-ISA-specific flags to the compiler - highway will do it for you. I've written a text on dispatching which you may find helpful in this context.

Vc is good for intel/AMD processors up to AVX2, and it has dedicated AVX support, which highway does not provide - there, the next-best thing is SSE4.2, which is quite similar but a bit slower. Apart from that, using Vc with zimt offers few advantages over using the highway backend. You may find Vc packages via your package manager - if you install from source, pick the 1.4 branch explicitly. To use this backend with zimt, #define USE_VC, and link with -lVc. Vc's 'spirit' is to provide a generic interface to SIMD programming, whereas highway tends to provide 'what is there', failing to compile code which requests features which can't be provided by the hardware, rather than falling back to using small loops to provide an implementation, which is Vc style. Using Vc, your code may compile just fine, but if it 'falls back to scalar' you may not notice the fact. highway will instead fail to compile the code. I prefer the generic approach and I've made great efforts to 'bend' my highway interface to behave more like Vc, but there's only so much I can do.

std::simd is part of the C++ standard and requires C++17. Some C++ libraries now provide an implementation of std::simd, but the implementation coming with g++ is - IMHO and of this writing - not very comprehensive, e.g. missing explict SIMD implementations of some math routines (like atan2). It is often a step up from no backend, though, and if you have a recent libstdc++ installed, it's definitely worth a try. To use this backend with zimt, #define USE_STDSIMD, and use -std=c++17.

For all compiles, it's crucial to use optimization (use -O3) and to specify use of the native CPU (unless you need portability). The latter is affected by '-march=native' on intel/AMD CPUs; for other platforms you may need to specify the actual CPU - e.g. -mcpu=apple-m1 for builds on Apple's M1. I prefer using clang++ over g++, but I'll refrain from a clear recommendation.

zimt components

In zimt, I provide a type for small aggregates, named 'xel_t' - xel is for things like pixels etc. - and a type for multidimensional arrays with arbitrary striding, named array_t/view_t . These stand in for the use of vigra::TinyVector and vigra::MultiArray(View) in vspline - vspline does not use much else from vigra, and the vigra library is very large, so I decided to replace these types with my own stripped-down versions to reduce a rather cumbersome dependency and make the code simple, focusing on what's needed in zimt. I also rely heavily on optimization, so I feel that I'm better off with simple constructs, because the chance that the optimizer will recognize optimizable patterns in the code should be higher this way. It's important to note that zimt is made to deal with 'xel' data and fundamentals only - you can create arrays of other data, but you can't use zimt's raison d'être, the transform family of functions, on them, because zimt has no notion of their 'simdized' equivalent. To use arrays of, say, structured types in zimt, you have to reformulate the data as several arrays of fundamentals (typically by creating views with appropriate striding, you don't have to copy the data).

The next important component is a type system for SIMD data. zimt provides a common interface to four different SIMD backends, which allows for easy prototyping, switching of backend library/code base, cherry-picking, etc. The interface is modelled after Vc::SimdArray, which was the first SIMD data type I used extensively. It's part of the Vc library and it's well-thought-out and pretty much just what I wanted, but Vc is now 'in maintenance mode' and newer ISAs and processors are no longer supported. Much of Vc's functionality made it into the C++ [std::simd](https://en.cppreference.com/w/cpp/experimental/simd/ "cppreference.com page: Data-p

Zimt

Install / Use

README

zimt - C++11 template library to process n-dimensional arrays with multi-threaded SIMD code

Installation

zimt in action

zimt backends

zimt components