SkillAgentSearch skills...

FABE

High-accuracy SIMD sin/cos/sincos library in C with AVX2, AVX-512, and NEON support. Full-range reduction. Fast at scale. Portable by design.

Install / Use

/learn @farukalpay/FABE

README

FABE13-HX: High-Performance SIMD Trigonometric Library for Scientific Computing

License Build Platform SIMD

FABE13-HX is a high-performance C math library that delivers ultra-fast trigonometric functions (sin, cos, sincos) using advanced SIMD vectorization. Powered by the innovative Ψ-Hyperbasis algorithm, it outperforms traditional math libraries by up to 8.4× while maintaining high precision.

🚀 Why Choose FABE13-HX for Your Numerical Computing Needs

FABE13-HX revolutionizes trigonometric computation for:

  • Machine Learning & AI Acceleration - Optimize neural network performance
  • Scientific Simulations & HPC - Accelerate physics, engineering, and computational modeling
  • Real-time Signal Processing - Enhance DSP, audio, and sensor data analysis
  • Graphics & Visualization Systems - Improve rendering performance
  • Embedded Computing - Efficient performance on resource-constrained systems

💡 Key Features & Performance Benefits

  • Up to 8.4× Faster Than Standard Math Libraries across various platforms and input sizes
  • 🔄 Cross-Architecture Optimization with support for AVX512F, AVX2+FMA (x86), NEON (ARM)
  • 🎯 High Precision with maximum error ≤ 2e-11 compared to standard libm
  • 🧠 Novel Rational-Function Architecture based on Ψ-Hyperbasis instead of traditional polynomials
  • 🔢 Extreme-Range Support accurate up to |x| ≈ 1e308 via advanced Payne–Hanek reduction
  • 🧩 Unified API for both scalar and vectorized operations
  • 🛡️ Robust Error Handling with proper NaN/Inf/0 behavior

Designed for numerical computing, AI acceleration, and scientific simulation, it replaces traditional polynomial approximations with a fused rational + correction model that's more efficient and vectorization-friendly.


📂 Project Structure

fabe13/                 # Core source
├── fabe13.c            # HX implementation
├── fabe13.h            # Public API
├── benchmark_fabe13.c  # Benchmark main

tests/
└── test_fabe13.c       # Optional unit tests

CMakeLists.txt          # Cross-platform CMake
Makefile                # Minimalist legacy build
build.sh                # Recommended build script (cross-platform)

⚙️ Build Instructions

✅ Recommended: build.sh

./build.sh

This script:

  • Cleans and configures the build (Release mode)
  • Enables both benchmarking and testing
  • Compiles using aggressive -Ofast, -ffast-math, -march=native flags
  • Runs all unit tests and benchmarks automatically

🛠️ Manual CMake

mkdir -p build && cd build
cmake .. -DFABE13_ENABLE_BENCHMARK=ON -DFABE13_ENABLE_TEST=ON
make
./fabe13_test
./fabe13_benchmark

🧱 Makefile (Legacy)

make all
make run-benchmark

🚀 FABE13-HX vs libm — Performance Benchmarks

FABE13-HX delivers consistent speedups over standard libm, across platforms and input sizes. These benchmarks highlight its advantage for both cloud-based and local environments.

📊 Performance Overview

  • 🟨 FABE13-HX: SIMD-accelerated (AVX2+FMA, Ψ-core)
  • 🔴 libm: Standard C math (math.h)
  • 🧠 Input size: N ∈ [10 ... 1,000,000,000] doubles
  • ⚙️ Timing: Full-array sincos() throughput
  • 📐 Aligned memory: 64 bytes
  • 🎯 Accuracy: ≤ 2e-11 max diff (sin/cos)

🌐 Replit (Cloud / Linux, AVX2 Clang)

FABE13-HX vs libm — Replit

FABE13-HX is consistently faster than libm — up to 8.4× for large inputs.

  • Platform: Replit Linux
  • SIMD: AVX2 + FMA
  • Compiler: Clang 14 (nix)
  • libm: GNU math.h

🍎 MacBook Pro (macOS AVX2, AppleClang)

FABE13-HX vs libm — macOS

🟨 FABE13-HX outperforms libm with up to 8.4× higher throughput on AppleClang (AVX2).

  • Platform: macOS 14.x (MacBook Pro 16")
  • SIMD: AVX2 + FMA
  • Compiler: AppleClang 16.0
  • libm: macOS system math.h

📊 Performance Overview

FABE13 Active Implementation: NEON (AArch64) (SIMD Width: 2)
Benchmark Alignment: 64 bytes

📈 Scaling with Array Size

8.4× throughput improvement for large array processing compared to standard libm

ARM64/AArch64 Performance (NEON)

| Array Size | FABE13 (sec) | Libm (sec) | FABE13 (M ops/sec) | Libm (M ops/sec) | Speedup | |------------|--------------|------------|-------------------|-----------------|---------| | 10 | 0.0000 | 0.0000 | 50.00 | 50.00 | 1.00x | | 100 | 0.0000 | 0.0000 | 166.67 | 71.43 | 2.33x | | 1,000 | 0.0000 | 0.0000 | 185.19 | 72.46 | 2.56x | | 10,000 | 0.0001 | 0.0001 | 173.01 | 71.02 | 2.44x | | 100,000 | 0.0006 | 0.0009 | 177.12 | 115.82 | 1.53x | | 1,000,000 | 0.0016 | 0.0072 | 614.85 | 138.34 | 4.44x | | 10,000,000 | 0.0164 | 0.0720 | 611.30 | 138.95 | 4.40x | | 100,000,000| 0.1673 | 0.7296 | 597.63 | 137.07 | 4.36x | | 1,000,000,000| 1.8044 | 10.4989 | 554.19 | 95.25 | 5.82x |

🔍 Detailed Benchmark Snapshot (N = 1,000,000)

FABE13:  0.0016 sec  |  614.85 M ops/sec
libm:    0.0072 sec  |  138.34 M ops/sec
Speedup: 4.44x

Memory: Allocated 0.04 GB
        Peak RSS: ~29 MB (FABE13), ~45 MB (Libm)
CPU:    100.0% utilization for both implementations

Max diff vs libm: sin=1.224e-11, cos=1.225e-11

🔬 Precision Analysis

  • All test cases maintain acceptable numerical accuracy compared to libm
  • Maximum difference observed: ~10⁻¹¹ for both sin and cos operations
  • Properly handles edge cases (0, inf, nan) with correct behavior

🔬 Core Algorithm (Ψ-Hyperbasis)

// Core rational transformation
Ψ(x) = x / (1 + (3/8)x²)

// sin(x) approximation
sin(x) ≈ Ψ ⋅ (1 - a1⋅Ψ² + a2⋅Ψ⁴ - a3⋅Ψ⁶)

// cos(x) approximation
cos(x) ≈ 1 - b1⋅Ψ² + b2⋅Ψ⁴ - b3⋅Ψ⁶

This allows both functions to share a unified base, optimizing performance and memory access.


📊 Public API

#include "fabe13/fabe13.h"

// Scalar API
double fabe13_sin(double x);
double fabe13_cos(double x);
double fabe13_sinc(double x);  // sin(x)/x
double fabe13_tan(double x);
double fabe13_cot(double x);
double fabe13_atan(double x);
double fabe13_asin(double x);  // [-1, 1]
double fabe13_acos(double x);  // [-1, 1]

// SIMD vector API
void fabe13_sincos(const double* in, double* sin_out, double* cos_out, int n);

🧠 Design Highlights

  • Branchless Quadrant Correction
  • NaN/Inf/0-safe logic
  • Prefetch-friendly & unrolled scalar fallback
  • SIMD-ready backend design (NEON / AVX2 / AVX512)
  • Precision-preserving range reduction

🔭 Future Development Roadmap

  • [ ] Extended SIMD Ψ-Hyperbasis implementation (AVX2 / NEON / AVX512)
  • [ ] Additional functions: cosm1, expm1, log1p with Ψ-Hyperbasis optimization
  • [ ] Single-precision float32 support (fabe13_sinf, etc.)
  • [ ] Ultra-fast LUT-based variants for performance-critical applications
  • [ ] Language bindings for Python, Rust, and C++
  • [ ] Documentation and examples for common use cases

📜 License

MIT License © 2025 Faruk Alpay
See LICENSE


🧬 Author

Faruk Alpay
https://Frontier2075.com
https://lightcap.ai

FABE13-HX is part of the Lightcap Initiative — building the most precise and elegant math primitives in open source.

View on GitHub
GitHub Stars60
CategoryCustomer
Updated6d ago
Forks0

Languages

C

Security Score

85/100

Audited on Mar 25, 2026

No findings