FABE13-HX: High-Performance SIMD Trigonometric Library for Scientific Computing

FABE13-HX is a high-performance C math library that delivers ultra-fast trigonometric functions (sin, cos, sincos) using advanced SIMD vectorization. Powered by the innovative Ψ-Hyperbasis algorithm, it outperforms traditional math libraries by up to 8.4× while maintaining high precision.

🚀 Why Choose FABE13-HX for Your Numerical Computing Needs

FABE13-HX revolutionizes trigonometric computation for:

Machine Learning & AI Acceleration - Optimize neural network performance
Scientific Simulations & HPC - Accelerate physics, engineering, and computational modeling
Real-time Signal Processing - Enhance DSP, audio, and sensor data analysis
Graphics & Visualization Systems - Improve rendering performance
Embedded Computing - Efficient performance on resource-constrained systems

💡 Key Features & Performance Benefits

⚡ Up to 8.4× Faster Than Standard Math Libraries across various platforms and input sizes
🔄 Cross-Architecture Optimization with support for AVX512F, AVX2+FMA (x86), NEON (ARM)
🎯 High Precision with maximum error ≤ 2e-11 compared to standard libm
🧠 Novel Rational-Function Architecture based on Ψ-Hyperbasis instead of traditional polynomials
🔢 Extreme-Range Support accurate up to |x| ≈ 1e308 via advanced Payne–Hanek reduction
🧩 Unified API for both scalar and vectorized operations
🛡️ Robust Error Handling with proper NaN/Inf/0 behavior

Designed for numerical computing, AI acceleration, and scientific simulation, it replaces traditional polynomial approximations with a fused rational + correction model that's more efficient and vectorization-friendly.

📂 Project Structure

fabe13/                 # Core source
├── fabe13.c            # HX implementation
├── fabe13.h            # Public API
├── benchmark_fabe13.c  # Benchmark main

tests/
└── test_fabe13.c       # Optional unit tests

CMakeLists.txt          # Cross-platform CMake
Makefile                # Minimalist legacy build
build.sh                # Recommended build script (cross-platform)

⚙️ Build Instructions

✅ Recommended: `build.sh`

./build.sh

This script:

Cleans and configures the build (Release mode)
Enables both benchmarking and testing
Compiles using aggressive -Ofast, -ffast-math, -march=native flags
Runs all unit tests and benchmarks automatically

🛠️ Manual CMake

mkdir -p build && cd build
cmake .. -DFABE13_ENABLE_BENCHMARK=ON -DFABE13_ENABLE_TEST=ON
make
./fabe13_test
./fabe13_benchmark

🧱 Makefile (Legacy)

make all
make run-benchmark

🚀 FABE13-HX vs libm — Performance Benchmarks

FABE13-HX delivers consistent speedups over standard libm, across platforms and input sizes. These benchmarks highlight its advantage for both cloud-based and local environments.

📊 Performance Overview

🟨 FABE13-HX: SIMD-accelerated (AVX2+FMA, Ψ-core)
🔴 libm: Standard C math (math.h)
🧠 Input size: N ∈ [10 ... 1,000,000,000] doubles
⚙️ Timing: Full-array sincos() throughput
📐 Aligned memory: 64 bytes
🎯 Accuracy: ≤ 2e-11 max diff (sin/cos)

🌐 Replit (Cloud / Linux, AVX2 Clang)

FABE13-HX vs libm — Replit

✅ FABE13-HX is consistently faster than libm — up to 8.4× for large inputs.

Platform: Replit Linux
SIMD: AVX2 + FMA
Compiler: Clang 14 (nix)
libm: GNU math.h

🍎 MacBook Pro (macOS AVX2, AppleClang)

FABE13-HX vs libm — macOS

🟨 FABE13-HX outperforms libm with up to 8.4× higher throughput on AppleClang (AVX2).

Platform: macOS 14.x (MacBook Pro 16")
SIMD: AVX2 + FMA
Compiler: AppleClang 16.0
libm: macOS system math.h

📊 Performance Overview

FABE13 Active Implementation: NEON (AArch64) (SIMD Width: 2)
Benchmark Alignment: 64 bytes

📈 Scaling with Array Size

8.4× throughput improvement for large array processing compared to standard libm

ARM64/AArch64 Performance (NEON)

| Array Size | FABE13 (sec) | Libm (sec) | FABE13 (M ops/sec) | Libm (M ops/sec) | Speedup | |------------|--------------|------------|-------------------|-----------------|---------| | 10 | 0.0000 | 0.0000 | 50.00 | 50.00 | 1.00x | | 100 | 0.0000 | 0.0000 | 166.67 | 71.43 | 2.33x | | 1,000 | 0.0000 | 0.0000 | 185.19 | 72.46 | 2.56x | | 10,000 | 0.0001 | 0.0001 | 173.01 | 71.02 | 2.44x | | 100,000 | 0.0006 | 0.0009 | 177.12 | 115.82 | 1.53x | | 1,000,000 | 0.0016 | 0.0072 | 614.85 | 138.34 | 4.44x | | 10,000,000 | 0.0164 | 0.0720 | 611.30 | 138.95 | 4.40x | | 100,000,000| 0.1673 | 0.7296 | 597.63 | 137.07 | 4.36x | | 1,000,000,000| 1.8044 | 10.4989 | 554.19 | 95.25 | 5.82x |

🔍 Detailed Benchmark Snapshot (N = 1,000,000)

FABE13:  0.0016 sec  |  614.85 M ops/sec
libm:    0.0072 sec  |  138.34 M ops/sec
Speedup: 4.44x

Memory: Allocated 0.04 GB
        Peak RSS: ~29 MB (FABE13), ~45 MB (Libm)
CPU:    100.0% utilization for both implementations

Max diff vs libm: sin=1.224e-11, cos=1.225e-11

🔬 Precision Analysis

All test cases maintain acceptable numerical accuracy compared to libm
Maximum difference observed: ~10⁻¹¹ for both sin and cos operations
Properly handles edge cases (0, inf, nan) with correct behavior

🔬 Core Algorithm (Ψ-Hyperbasis)

// Core rational transformation
Ψ(x) = x / (1 + (3/8)x²)

// sin(x) approximation
sin(x) ≈ Ψ ⋅ (1 - a1⋅Ψ² + a2⋅Ψ⁴ - a3⋅Ψ⁶)

// cos(x) approximation
cos(x) ≈ 1 - b1⋅Ψ² + b2⋅Ψ⁴ - b3⋅Ψ⁶

This allows both functions to share a unified base, optimizing performance and memory access.

📊 Public API

#include "fabe13/fabe13.h"

// Scalar API
double fabe13_sin(double x);
double fabe13_cos(double x);
double fabe13_sinc(double x);  // sin(x)/x
double fabe13_tan(double x);
double fabe13_cot(double x);
double fabe13_atan(double x);
double fabe13_asin(double x);  // [-1, 1]
double fabe13_acos(double x);  // [-1, 1]

// SIMD vector API
void fabe13_sincos(const double* in, double* sin_out, double* cos_out, int n);

🧠 Design Highlights

✅ Branchless Quadrant Correction
✅ NaN/Inf/0-safe logic
✅ Prefetch-friendly & unrolled scalar fallback
✅ SIMD-ready backend design (NEON / AVX2 / AVX512)
✅ Precision-preserving range reduction

🔭 Future Development Roadmap

[ ] Extended SIMD Ψ-Hyperbasis implementation (AVX2 / NEON / AVX512)
[ ] Additional functions: cosm1, expm1, log1p with Ψ-Hyperbasis optimization
[ ] Single-precision float32 support (fabe13_sinf, etc.)
[ ] Ultra-fast LUT-based variants for performance-critical applications
[ ] Language bindings for Python, Rust, and C++
[ ] Documentation and examples for common use cases

📜 License

🧬 Author

Faruk Alpay
https://Frontier2075.com
https://lightcap.ai

FABE13-HX is part of the Lightcap Initiative — building the most precise and elegant math primitives in open source.

FABE

Install / Use

README

FABE13-HX: High-Performance SIMD Trigonometric Library for Scientific Computing

🚀 Why Choose FABE13-HX for Your Numerical Computing Needs

💡 Key Features & Performance Benefits

📂 Project Structure

⚙️ Build Instructions

✅ Recommended: `build.sh`

🛠️ Manual CMake

🧱 Makefile (Legacy)

🚀 FABE13-HX vs libm — Performance Benchmarks

📊 Performance Overview

🌐 Replit (Cloud / Linux, AVX2 Clang)

🍎 MacBook Pro (macOS AVX2, AppleClang)

📊 Performance Overview

📈 Scaling with Array Size

ARM64/AArch64 Performance (NEON)

🔍 Detailed Benchmark Snapshot (N = 1,000,000)

🔬 Precision Analysis

🔬 Core Algorithm (Ψ-Hyperbasis)

📊 Public API

🧠 Design Highlights

🔭 Future Development Roadmap

📜 License

🧬 Author

FABE

Install / Use

README

FABE13-HX: High-Performance SIMD Trigonometric Library for Scientific Computing

🚀 Why Choose FABE13-HX for Your Numerical Computing Needs

💡 Key Features & Performance Benefits

📂 Project Structure

⚙️ Build Instructions

✅ Recommended: build.sh

🛠️ Manual CMake

🧱 Makefile (Legacy)

🚀 FABE13-HX vs libm — Performance Benchmarks

📊 Performance Overview

🌐 Replit (Cloud / Linux, AVX2 Clang)

🍎 MacBook Pro (macOS AVX2, AppleClang)

📊 Performance Overview

📈 Scaling with Array Size

ARM64/AArch64 Performance (NEON)

🔍 Detailed Benchmark Snapshot (N = 1,000,000)

🔬 Precision Analysis

🔬 Core Algorithm (Ψ-Hyperbasis)

📊 Public API

🧠 Design Highlights

🔭 Future Development Roadmap

📜 License

🧬 Author

✅ Recommended: `build.sh`