Fastor

A lightweight high performance tensor algebra framework for modern C++

Generate Convert Improve

Install / Use

/learn @romeric/Fastor

About this skill

Quality Score

0/100

README

GitHub release (latest by date)

Fastor

Fastor is a high performance tensor (fixed multi-dimensional array) library for modern C++.

Fastor offers:

High-level interface for manipulating multi-dimensional arrays in C++ that look and feel native to scientific programmers
Bare metal performance for small matrix/tensor multiplications, contractions and tensor factorisations [LU, QR etc]. Refer to benchmarks to see how Fastor delivers performance on par with MKL JIT's dedicated API
Compile time operation minimisation such as graph optimisation, greedy matrix-chain products and nearly symbolic manipulations to reduce the complexity of evaluation of BLAS or non-BLAS type expressions by orders of magnitude
Explicit and configurable SIMD vectorisation supporting all numeric data types float32, float64, complex float32 and complex float64 as well as integral types with
Optional SIMD backends such as sleef, Vc or even std::experimental::simd
Optional JIT backend using Intel's MKL-JIT and LIBXSMM for performance portable code
Ability to wrap existing data and operate on them using Fastor's highly optimised kernels
Suitable linear algebra library for FPGAs, micro-controllers and embedded systems due to absolutely no dynamic allocations and no RTTI
Light weight header-only library with no external dependencies offering fast compilation times
Well-tested on most compilers including GCC, Clang, Intel's ICC and MSVC

Documentation

Documenation can be found under the Wiki pages.

High-level interface

Fastor provides a high level interface for tensor algebra. To get a glimpse, consider the following

Tensor<double> scalar = 3.5;                // A scalar
Tensor<float,3> vector3 = {1,2,3};          // A vector
Tensor<int,3,2> matrix{{1,2},{3,4},{5,6}};  // A second order tensor
Tensor<double,3,3,3> tensor_3;              // A third order tensor with dimension 3x3x3
tensor_3.arange(0);                         // fill tensor with sequentially ascending numbers
tensor_3(0,2,1);                            // index a tensor
tensor_3(all,last,seq(0,2));                // slice a tensor tensor_3[:,-1,:2]
tensor_3.rank();                            // get rank of tensor, 3 in this case
Tensor<float,2,2,2,2,2,2,4,3,2,3,3,6> t_12; // A 12th order tensor

Tensor contraction

Einstein summation as well as summing over multiple (i.e. more than two) indices are supported. As a complete example consider

#include <Fastor/Fastor.h>
using namespace Fastor;
enum {I,J,K,L,M,N};

int main() {
    // An example of Einstein summation
    Tensor<double,2,3,5> A; Tensor<double,3,5,2,4> B;
    // fill A and B
    A.random(); B.random();
    auto C = einsum<Index<I,J,K>,Index<J,L,M,N>>(A,B);

    // An example of summing over three indices
    Tensor<double,5,5,5> D; D.random();
    auto E = inner(D);

    // An example of tensor permutation
    Tensor<float,3,4,5,2> F; F.random();
    auto G = permute<Index<J,K,I,L>>(F);

    // Output the results
    print("Our big tensors:",C,E,G);

    return 0;
}

You can compile this by providing the following flags to your compiler -std=c++14 -O3 -march=native -DNDEBUG.

Tensor views: A powerful indexing, slicing and broadcasting mechanism

Fastor provides powerful tensor views for block indexing, slicing and broadcasting familiar to scientific programmers. Consider the following examples

Tensor<double,4,3,10> A, B;
A.random(); B.random();
Tensor<double,2,2,5> C; Tensor<double,4,3,1> D;

// Dynamic views -> seq(first,last,step)
C = A(seq(0,2),seq(0,2),seq(0,last,2));                              // C = A[0:2,0:2,0::2]
D = B(all,all,0) + A(all,all,last);                                  // D = B[:,:,0] + A[:,:,-1]
A(2,all,3) = 5.0;                                                    // A[2,:,3] = 5.0

// Static views -> fseq<first,last,step>
C = A(fseq<0,2>(),fseq<0,2>(),fseq<0,last,2>());                     // C = A[0:2,0:2,0::2]
D = B(all, all, fix<0>) + A(all, all, fix<last>());                  // D = B[:,:,0] + A[:,:,-1]
A(2,all,3) = 5.0;                                                    // A[2,:,3] = 5.0

// Overlapping is also allowed without having undefined/aliasing behaviour
A(seq(2,last),all,all).noalias() += A(seq(0,last-2),all,all);        // A[2::,:,:] += A[::-2,:,:]
// Note that in case of perfect overlapping noalias is not required
A(seq(0,last-2),all,all) += A(seq(0,last-2),all,all);                // A[::2,:,:] += A[::2,:,:]

// If instead of a tensor view, one needs an actual tensor the iseq could be used
// iseq<first,last,step>
C = A(iseq<0,2>(),iseq<0,2>(),iseq<0,last,2>());                     // C = A[0:2,0:2,0::2]
// Note that iseq returns an immediate tensor rather than a tensor view and hence cannot appear
// on the left hand side, for instance
A(iseq<0,2>(),iseq<0,2>(),iseq<0,last,2>()) = 2; // Will not compile, as left operand is an rvalue

// One can also index a tensor with another tensor(s)
Tensor<float,10,10> E; E.fill(2);
Tensor<int,5> it = {0,1,3,6,8};
Tensor<size_t,10,10> t_it; t_it.arange();
E(it,0) = 2;
E(it,seq(0,last,3)) /= -1000.;
E(all,it) += E(all,it) * 15.;
E(t_it) -= 42 + E;

// Masked and filtered views are also supported
Tensor<double,2,2> F;
Tensor<bool,2,2> mask = {{true,false},{false,true}};
F(mask) += 10;

All possible combination of slicing and broadcasting is possible. For instance, one complex slicing and broadcasting example is given below

A(all,all) -= log(B(all,all,0)) + abs(B(all,all,1)) + sin(C(all,0,all,0)) - 102. - cos(B(all,all,0));

<!-- From a performance point of view, Fastor tries very hard to vectorise (read SIMD vectorisation) tensor views, but this heavily depends on the compilers ability to inline multiple recursive functions [as is the case for all expression templates]. If a view appears on the right hand side of an assignment, but not on the left, Fastor automatically vectorises the expression. However if a view appears on the left hand side of an assignment, Fastor does not by default vectorise the expression. To

Related Skills

node-connect

344.4k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

99.2k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

344.4k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

qqbot-media

344.4k

QQBot 富媒体收发能力。使用 <qqmedia> 标签，系统根据文件扩展名自动识别类型（图片/语音/视频/文件）。