Vek
SIMD Accelerated vector functions for Go
Install / Use
/learn @viterin/VekREADME
vek | SIMD Vector Functions
vek is a collection of SIMD accelerated vector functions for Go.
Most modern CPUs have special SIMD instructions (Single Instruction, Multiple Data) to
process data in parallel, but there is currently no way to use them in a pure Go program.
vek implements a large number of common vector operations in SIMD accelerated assembly
code and wraps them in a simple Go API. vek supports most modern x86 CPUs and falls
back to a pure Go implementation on unsupported platforms.
Features
- Fast, average speedups of 10x for
float32vectors - Fallback to pure Go on unsupported platforms
- Support for
float64,float32andboolvectors - Zero allocation variations of each function
Installation
go get -u github.com/viterin/vek
Getting Started
Simple Arithmetic Example
Vectors are represented as plain old floating point slices, there are no special data
types in vek. All operations on float64 vectors reside in the vek package. It contains
all the basic arithmetic operations:
package main
import (
"fmt"
"github.com/viterin/vek"
)
func main() {
x := []float64{0, 1, 2, 3, 4}
// Multiply a vector by itself element-wise
y := vek.Mul(x, x)
fmt.Println(x, y) // [0 1 2 3 4] [0 1 4 9 16]
// Multiply each element by a number
y = vek.MulNumber(x, 2)
fmt.Println(x, y) // [0 1 2 3 4] [0 2 4 6 8]
}
Working With 32-Bit Vectors
The vek32 package contains float32 versions of each operation:
package main
import (
"fmt"
"github.com/viterin/vek/vek32"
)
func main() {
// Add a float32 number to each element
x := []float32{0, 1, 2, 3, 4}
y := vek32.AddNumber(x, 2)
fmt.Println(x, y) // [0 1 2 3 4] [2 3 4 5 6]
}
Comparisons and Selections
Floating point vectors can be compared to other vectors or numbers. The result is a bool vector
indicating where the comparison holds true. bool vectors can be used to select matching elements,
count matches and more:
package main
import (
"fmt"
"github.com/viterin/vek"
)
func main() {
x := []float64{0, 1, 2, 3, 4, 5}
y := []float64{5, 4, 3, 2, 1, 0}
// []bool indicating where x < y (less than)
m := vek.Lt(x, y)
fmt.Println(m) // [true true true false false false]
fmt.Println(vek.Count(m)) // 3
// []bool indicating where x >= 2 (greater than or equal)
m = vek.GteNumber(x, 2)
fmt.Println(m) // [false false true true true true]
fmt.Println(vek.Any(m)) // true
// Selection of non-zero elements less than y
z := vek.Select(x,
vek.And(
vek.Lt(x, y),
vek.NeqNumber(x, 0),
),
)
fmt.Println(z) // [1 2]
}
Creating and Converting Vectors
vek has a number of functions to construct new vectors and convert between vector types efficiently:
package main
import (
"fmt"
"github.com/viterin/vek"
"github.com/viterin/vek/vek32"
)
func main() {
// Vector with number repeated n times
x := vek.Repeat(2, 5)
fmt.Println(x) // [2 2 2 2 2]
// Vector ranging from a to b (excl.) in steps of 1
x = vek.Range(-2, 3)
fmt.Println(x) // [-2 -1 0 1 2]
// Conversion from float64 to int32
xi32 := vek.ToInt32(x)
fmt.Println(xi32) // [-2 -1 0 1 2]
// Conversion from int32 to float32
x32 := vek32.FromInt32(xi32)
fmt.Println(x32) // [-2 -1 0 1 2]
}
Avoiding Allocations
By default, functions allocate a new array to store the result. Append _Inplace
to a function to do the operation inplace, overriding the data of the first
argument slice with the result. Append _Into to write the result into a target
slice.
package main
import (
"fmt"
"github.com/viterin/vek"
)
func main() {
x := []float64{0, 1, 2, 3, 4}
vek.AddNumber_Inplace(x, 2)
y := make([]float64, len(x))
vek.AddNumber_Into(y, x, 2)
fmt.Println(x, y) // [2 3 4 5 6] [4 5 6 7 8]
}
SIMD Acceleration
SIMD Acceleration is enabled by default on supported platforms, which is any x86/amd64 CPU with
the AVX2 and FMA extensions. Use vek.Info() to see if hardware acceleration is enabled. Turn
it off or on with vek.SetAcceleration(). Acceleration is currently disabled by default on
mac as I have no machine to test it on.
package main
import (
"fmt"
"github.com/viterin/vek"
)
func main() {
fmt.Printf("%+v", vek.Info())
// {CPUArchitecture:amd64 CPUFeatures:[AVX2 FMA ..] Acceleration:true}
}
API
| | description | |:--------------------------------|----------------------------------------------:| | Arithmetic | | | vek.Add(x, y) | element-wise addition | | vek.AddNumber(x, a) | add number to each element | | vek.Sub(x, y) | element-wise subtraction | | vek.SubNumber(x, a) | subtract number from each element | | vek.Mul(x, y) | element-wise multiplication | | vek.MulNumber(x, a) | multiply each element by number | | vek.Div(x, y) | element-wise division | | vek.DivNumber(x, a) | divide each element by number | | vek.Abs(x) | absolute values | | vek.Neg(x) | additive inverses | | vek.Inv(x) | multiplicative inverses | | Aggregates | | | vek.Sum(x) | sum of elements | | vek.CumSum(x) | cumulative sum | | vek.Prod(x) | product of elements | | vek.CumProd(x) | cumulative product | | vek.Mean(x) | mean | | vek.Median(x) | median | | vek.Quantile(x, q) | q-th quantile, 0 <= q <= 1 | | Distance | | | vek.Dot(x, y) | dot product | | vek.Norm(x) | euclidean norm (length) | | vek.Distance(x, y) | euclidean distance | | vek.ManhattanNorm(x) | sum of absolute values | | vek.ManhattanDistance(x, y) | sum of absolute differences | | vek.CosineSimilarity(x, y) | cosine similarity | | Matrices | | | vek.MatMul(x, y, n) | multiply m-by-n and n-by-p matrix (row-major) | | vek.Mat4Mul(x, y) | specialization for 4 by 4 matrices | | Special | | | vek.Sqrt(x) | square root of each element | | vek.Pow(x, y) | element-wise power | | vek.Round(x), Floor(x), Ceil(x) | round to nearest, lesser or greater integer | | Special (32-bit only) | | | vek32.Sin(x) | sine of each element | | vek32.Cos(x) | cosine of each element | | vek32.Exp(x) | exponential function | | vek32.Log(x), Log2(x), Log10(x) | natural, base 2 and base 10 logarithms | | Comparison | | | vek.Min(x) | minimum value | | vek.ArgMin(x) | first index of the minimum value | | vek.Minimum(x, y) | element-wise minimum values | | vek.MinimumNumber(x, a) | minimum of each element and number | | vek.Max(x) | maximum value | | vek.ArgMax(x) | first index of the maximum value | | vek.Maximum(x, y) | element-wise maximum values | | vek.MaximumNumber(x, a) | maximum of each element and number | | vek.Find(x, a) | first index of number, -1 if not found | | vek.Lt(x, y) | element-wise less than | | vek.LtNumber(x, a) | less than number | | vek.Lte(x, y) | element-wise less than or equal | | vek.LteNumber(x, a) | less than or equal to number | | vek.Gt(x, y) | element-wise greater than | | vek.GtNumber(x, a) | greater than number | | vek.Gte(x, y) | element-wise greater than or equal | | vek.GteNumber(x, a) | greater than or equal to number | | vek.Eq(x, y) | element-wise equality | | vek.EqNumber(x, a) | equal to number | | vek.Neq(x, y) | element-wise non-equality | | vek.NeqNumber(x, a) | not equal to number | | Boolean |
Related Skills
node-connect
349.2kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
109.5kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
349.2kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
349.2kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
