Fastor
A lightweight high performance tensor algebra framework for modern C++
Install / Use
/learn @romeric/FastorREADME
Fastor
Fastor is a high performance tensor (fixed multi-dimensional array) library for modern C++.
Fastor offers:
- High-level interface for manipulating multi-dimensional arrays in C++ that look and feel native to scientific programmers
- Bare metal performance for small matrix/tensor multiplications, contractions and tensor factorisations [LU, QR etc]. Refer to benchmarks to see how Fastor delivers performance on par with MKL JIT's dedicated API
- Compile time operation minimisation such as graph optimisation, greedy matrix-chain products and nearly symbolic manipulations to reduce the complexity of evaluation of BLAS or non-BLAS type expressions by orders of magnitude
- Explicit and configurable SIMD vectorisation supporting all numeric data types
float32,float64,complex float32andcomplex float64as well as integral types with - Optional SIMD backends such as sleef, Vc or even std::experimental::simd
- Optional JIT backend using Intel's MKL-JIT and LIBXSMM for performance portable code
- Ability to wrap existing data and operate on them using Fastor's highly optimised kernels
- Suitable linear algebra library for FPGAs, micro-controllers and embedded systems due to absolutely no dynamic allocations and no RTTI
- Light weight header-only library with no external dependencies offering fast compilation times
- Well-tested on most compilers including GCC, Clang, Intel's ICC and MSVC
Documentation
Documenation can be found under the Wiki pages.
High-level interface
Fastor provides a high level interface for tensor algebra. To get a glimpse, consider the following
Tensor<double> scalar = 3.5; // A scalar
Tensor<float,3> vector3 = {1,2,3}; // A vector
Tensor<int,3,2> matrix{{1,2},{3,4},{5,6}}; // A second order tensor
Tensor<double,3,3,3> tensor_3; // A third order tensor with dimension 3x3x3
tensor_3.arange(0); // fill tensor with sequentially ascending numbers
tensor_3(0,2,1); // index a tensor
tensor_3(all,last,seq(0,2)); // slice a tensor tensor_3[:,-1,:2]
tensor_3.rank(); // get rank of tensor, 3 in this case
Tensor<float,2,2,2,2,2,2,4,3,2,3,3,6> t_12; // A 12th order tensor
<!-- a sample output of the above code would be
~~~bash
[0,:,:]
⎡ 0, 1, 2 ⎤
⎢ 3, 4, 5 ⎥
⎣ 6, 7, 8 ⎦
[1,:,:]
⎡ 9, 10, 11 ⎤
⎢ 12, 13, 14 ⎥
⎣ 15, 16, 17 ⎦
[2,:,:]
⎡ 18, 19, 20 ⎤
⎢ 21, 22, 23 ⎥
⎣ 24, 25, 26 ⎦
~~~ -->
Tensor contraction
Einstein summation as well as summing over multiple (i.e. more than two) indices are supported. As a complete example consider
#include <Fastor/Fastor.h>
using namespace Fastor;
enum {I,J,K,L,M,N};
int main() {
// An example of Einstein summation
Tensor<double,2,3,5> A; Tensor<double,3,5,2,4> B;
// fill A and B
A.random(); B.random();
auto C = einsum<Index<I,J,K>,Index<J,L,M,N>>(A,B);
// An example of summing over three indices
Tensor<double,5,5,5> D; D.random();
auto E = inner(D);
// An example of tensor permutation
Tensor<float,3,4,5,2> F; F.random();
auto G = permute<Index<J,K,I,L>>(F);
// Output the results
print("Our big tensors:",C,E,G);
return 0;
}
You can compile this by providing the following flags to your compiler -std=c++14 -O3 -march=native -DNDEBUG.
Tensor views: A powerful indexing, slicing and broadcasting mechanism
Fastor provides powerful tensor views for block indexing, slicing and broadcasting familiar to scientific programmers. Consider the following examples
Tensor<double,4,3,10> A, B;
A.random(); B.random();
Tensor<double,2,2,5> C; Tensor<double,4,3,1> D;
// Dynamic views -> seq(first,last,step)
C = A(seq(0,2),seq(0,2),seq(0,last,2)); // C = A[0:2,0:2,0::2]
D = B(all,all,0) + A(all,all,last); // D = B[:,:,0] + A[:,:,-1]
A(2,all,3) = 5.0; // A[2,:,3] = 5.0
// Static views -> fseq<first,last,step>
C = A(fseq<0,2>(),fseq<0,2>(),fseq<0,last,2>()); // C = A[0:2,0:2,0::2]
D = B(all, all, fix<0>) + A(all, all, fix<last>()); // D = B[:,:,0] + A[:,:,-1]
A(2,all,3) = 5.0; // A[2,:,3] = 5.0
// Overlapping is also allowed without having undefined/aliasing behaviour
A(seq(2,last),all,all).noalias() += A(seq(0,last-2),all,all); // A[2::,:,:] += A[::-2,:,:]
// Note that in case of perfect overlapping noalias is not required
A(seq(0,last-2),all,all) += A(seq(0,last-2),all,all); // A[::2,:,:] += A[::2,:,:]
// If instead of a tensor view, one needs an actual tensor the iseq could be used
// iseq<first,last,step>
C = A(iseq<0,2>(),iseq<0,2>(),iseq<0,last,2>()); // C = A[0:2,0:2,0::2]
// Note that iseq returns an immediate tensor rather than a tensor view and hence cannot appear
// on the left hand side, for instance
A(iseq<0,2>(),iseq<0,2>(),iseq<0,last,2>()) = 2; // Will not compile, as left operand is an rvalue
// One can also index a tensor with another tensor(s)
Tensor<float,10,10> E; E.fill(2);
Tensor<int,5> it = {0,1,3,6,8};
Tensor<size_t,10,10> t_it; t_it.arange();
E(it,0) = 2;
E(it,seq(0,last,3)) /= -1000.;
E(all,it) += E(all,it) * 15.;
E(t_it) -= 42 + E;
// Masked and filtered views are also supported
Tensor<double,2,2> F;
Tensor<bool,2,2> mask = {{true,false},{false,true}};
F(mask) += 10;
All possible combination of slicing and broadcasting is possible. For instance, one complex slicing and broadcasting example is given below
A(all,all) -= log(B(all,all,0)) + abs(B(all,all,1)) + sin(C(all,0,all,0)) - 102. - cos(B(all,all,0));
<!-- It should be mentioned that since tensor views work on a view of (reference to) a tensor and do not copy any data in the background, the use of the keyword `auto` can be dangerous at times
~~~c++
auto B = A(all,all,seq(0,5),seq(0,3)); // the scope of view expressions ends with ; as view is a refrerence to an rvalue
auto C = B + 2; // Hence this will sigfault as B refers to a non-existing piece of memory
~~~
To solve this issue, use immediate construction from a view
~~~c++
Tensor<double,2,2,5,3> B = A(all,all,seq(0,5),seq(0,3)); // B is now permanent
auto C = B + 2; // This will behave as expected
~~~ -->
<!-- From a performance point of view, Fastor tries very hard to vectorise (read SIMD vectorisation) tensor views, but this heavily depends on the compilers ability to inline multiple recursive functions [as is the case for all expression templates]. If a view appears on the right hand side of an assignment, but not on the left, Fastor automatically vectorises the expression. However if a view appears on the left hand side of an assignment, Fastor does not by default vectorise the expression. To Related Skills
node-connect
344.4kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
99.2kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
344.4kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
344.4kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
