Silarray

Adaptive Numerical Computing Library for Apple Silicon

Generate Convert Improve

Install / Use

/learn @yhirose/Silarray

About this skill

Quality Score

0/100

README

Silicon Array

Numerical Computing Library for Apple Silicon

Header-only C++23 library -- just #include <silarray.h>
Switchable CPU/GPU backend via sil::use_cpu() / sil::use_mps() (default: GPU)
CPU: Accelerate framework (vDSP, CBLAS, NEON)
GPU: Metal Shading Language (MSL) with STEEL SGEMM kernel, implicit GEMM conv2d, and optimized reduction/softmax/layer_norm kernels
Lazy evaluation with expression templates and affine fusion for chained elementwise operations
Data types: float, int, bool

Requirements

macOS with Apple Silicon
Xcode Command Line Tools (clang++ with C++23 support)
Frameworks: Metal, Accelerate, MetalPerformanceShaders, Foundation

Example

#include <silarray.h>

auto a = sil::ones<float>({1000, 1000});
auto b = sil::ones<float>({1000, 1000});

auto c = a + b;       // runs on GPU (default)
auto d = a.dot(b);

sil::use_cpu();        // switch to CPU backend
auto e = a + b;        // runs on CPU

Operations

CPU/GPU switchable

| Category | Operations | |----------|-----------| | Arithmetic | + - * / pow (elementwise, with broadcasting) | | In-place | += -= *= /= | | Linear algebra | dot (STEEL SGEMM on GPU, CBLAS on CPU) | | Activations | sigmoid relu softmax layer_norm | | Fused ops | linear (dot + bias), linear_sigmoid (dot + bias + sigmoid) | | Reduction | sum sum(axis) min max argmax | | Convolution | conv2d (implicit GEMM on GPU, NHWC layout) |

CPU only

| Category | Operations | |----------|-----------| | Comparison | == != > < >= <= | | Shape | clone transpose reshape broadcast | | Creation | empty zeros ones random constants | | Reduction | mean mean(axis) count all | | NN utilities | mean_square_error one_hot sigmoid_backward | | Selection | where(condition, x, y) | | Testing | array_equal allclose |

Performance

Competitive with MLX across most operations on Apple M1 Pro:

| Category | vs MLX | |----------|--------| | Elementwise (add, mul, div, pow) | Same speed | | Reduction (sum, min, max) | 1.0–1.6x faster | | Softmax, Layer Norm | 1.0–3.7x faster | | SGEMM (square, 1024–4096) | Same speed | | SGEMM (small-batch) | Up to 2.1x faster | | Conv2d (ResNet mid) | Same speed | | Transformer inference | 1.0–1.5x faster | | MLP inference (batch=1024) | Same speed | | Training (backward pass) | 2–5x slower (eager dispatch model) |

See bench/README.md for detailed results.

Build and Run

Unit tests

cd test
make

Tests can be run in different device modes:

./test          # GPU mode (default)
./test --gpu    # explicit GPU
./test --cpu    # CPU mode

MNIST

cd test
make mnist
./mnist

Benchmarks

Benchmarks compare against Eigen, MLX, libtorch, and ggml.

just bench-all          # all benchmarks
just bench-micro        # micro only

# or manually:
cd bench
make run                # all benchmarks
make table              # Markdown table output

See bench/README.md for setup instructions and full results.

Architecture

include/
  silarray.h          Main header (includes all below)
  array.h             Core array class with expression templates
  cpu.h               CPU backend (Accelerate: vDSP, CBLAS, NEON)
  gpu.h               GPU backend (Metal/MSL kernels + MPS fallback)
  device.h            Device selection (CPU/MPS switch)
  types.h             Type concepts (float, int, bool)
  objc.h              Objective-C bridge for Metal API
  unified_memory.h    GPU shared memory management

License

Related Skills

node-connect

352.9k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

111.5k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

352.9k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

qqbot-media

352.9k

QQBot 富媒体收发能力。使用 <qqmedia> 标签，系统根据文件扩展名自动识别类型（图片/语音/视频/文件）。