FABE
High-accuracy SIMD sin/cos/sincos library in C with AVX2, AVX-512, and NEON support. Full-range reduction. Fast at scale. Portable by design.
Install / Use
/learn @farukalpay/FABEREADME
FABE13-HX: High-Performance SIMD Trigonometric Library for Scientific Computing
FABE13-HX is a high-performance C math library that delivers ultra-fast trigonometric functions (sin, cos, sincos) using advanced SIMD vectorization. Powered by the innovative Ψ-Hyperbasis algorithm, it outperforms traditional math libraries by up to 8.4× while maintaining high precision.
🚀 Why Choose FABE13-HX for Your Numerical Computing Needs
FABE13-HX revolutionizes trigonometric computation for:
- Machine Learning & AI Acceleration - Optimize neural network performance
- Scientific Simulations & HPC - Accelerate physics, engineering, and computational modeling
- Real-time Signal Processing - Enhance DSP, audio, and sensor data analysis
- Graphics & Visualization Systems - Improve rendering performance
- Embedded Computing - Efficient performance on resource-constrained systems
💡 Key Features & Performance Benefits
- ⚡ Up to 8.4× Faster Than Standard Math Libraries across various platforms and input sizes
- 🔄 Cross-Architecture Optimization with support for AVX512F, AVX2+FMA (x86), NEON (ARM)
- 🎯 High Precision with maximum error ≤ 2e-11 compared to standard libm
- 🧠 Novel Rational-Function Architecture based on Ψ-Hyperbasis instead of traditional polynomials
- 🔢 Extreme-Range Support accurate up to |x| ≈ 1e308 via advanced Payne–Hanek reduction
- 🧩 Unified API for both scalar and vectorized operations
- 🛡️ Robust Error Handling with proper NaN/Inf/0 behavior
Designed for numerical computing, AI acceleration, and scientific simulation, it replaces traditional polynomial approximations with a fused rational + correction model that's more efficient and vectorization-friendly.
📂 Project Structure
fabe13/ # Core source
├── fabe13.c # HX implementation
├── fabe13.h # Public API
├── benchmark_fabe13.c # Benchmark main
tests/
└── test_fabe13.c # Optional unit tests
CMakeLists.txt # Cross-platform CMake
Makefile # Minimalist legacy build
build.sh # Recommended build script (cross-platform)
⚙️ Build Instructions
✅ Recommended: build.sh
./build.sh
This script:
- Cleans and configures the build (Release mode)
- Enables both benchmarking and testing
- Compiles using aggressive
-Ofast,-ffast-math,-march=nativeflags - Runs all unit tests and benchmarks automatically
🛠️ Manual CMake
mkdir -p build && cd build
cmake .. -DFABE13_ENABLE_BENCHMARK=ON -DFABE13_ENABLE_TEST=ON
make
./fabe13_test
./fabe13_benchmark
🧱 Makefile (Legacy)
make all
make run-benchmark
🚀 FABE13-HX vs libm — Performance Benchmarks
FABE13-HX delivers consistent speedups over standard libm, across platforms and input sizes. These benchmarks highlight its advantage for both cloud-based and local environments.
📊 Performance Overview
- 🟨 FABE13-HX: SIMD-accelerated (
AVX2+FMA, Ψ-core) - 🔴 libm: Standard C math (
math.h) - 🧠 Input size:
N ∈ [10 ... 1,000,000,000]doubles - ⚙️ Timing: Full-array
sincos()throughput - 📐 Aligned memory: 64 bytes
- 🎯 Accuracy: ≤ 2e-11 max diff (sin/cos)
🌐 Replit (Cloud / Linux, AVX2 Clang)
.png)
✅ FABE13-HX is consistently faster than libm — up to 8.4× for large inputs.
- Platform: Replit Linux
- SIMD: AVX2 + FMA
- Compiler: Clang 14 (nix)
- libm: GNU
math.h
🍎 MacBook Pro (macOS AVX2, AppleClang)

🟨 FABE13-HX outperforms libm with up to 8.4× higher throughput on AppleClang (AVX2).
- Platform: macOS 14.x (MacBook Pro 16")
- SIMD: AVX2 + FMA
- Compiler: AppleClang 16.0
- libm: macOS system
math.h
📊 Performance Overview
FABE13 Active Implementation: NEON (AArch64) (SIMD Width: 2)
Benchmark Alignment: 64 bytes
📈 Scaling with Array Size
8.4× throughput improvement for large array processing compared to standard libm
ARM64/AArch64 Performance (NEON)
| Array Size | FABE13 (sec) | Libm (sec) | FABE13 (M ops/sec) | Libm (M ops/sec) | Speedup | |------------|--------------|------------|-------------------|-----------------|---------| | 10 | 0.0000 | 0.0000 | 50.00 | 50.00 | 1.00x | | 100 | 0.0000 | 0.0000 | 166.67 | 71.43 | 2.33x | | 1,000 | 0.0000 | 0.0000 | 185.19 | 72.46 | 2.56x | | 10,000 | 0.0001 | 0.0001 | 173.01 | 71.02 | 2.44x | | 100,000 | 0.0006 | 0.0009 | 177.12 | 115.82 | 1.53x | | 1,000,000 | 0.0016 | 0.0072 | 614.85 | 138.34 | 4.44x | | 10,000,000 | 0.0164 | 0.0720 | 611.30 | 138.95 | 4.40x | | 100,000,000| 0.1673 | 0.7296 | 597.63 | 137.07 | 4.36x | | 1,000,000,000| 1.8044 | 10.4989 | 554.19 | 95.25 | 5.82x |
🔍 Detailed Benchmark Snapshot (N = 1,000,000)
FABE13: 0.0016 sec | 614.85 M ops/sec
libm: 0.0072 sec | 138.34 M ops/sec
Speedup: 4.44x
Memory: Allocated 0.04 GB
Peak RSS: ~29 MB (FABE13), ~45 MB (Libm)
CPU: 100.0% utilization for both implementations
Max diff vs libm: sin=1.224e-11, cos=1.225e-11
🔬 Precision Analysis
- All test cases maintain acceptable numerical accuracy compared to libm
- Maximum difference observed: ~10⁻¹¹ for both sin and cos operations
- Properly handles edge cases (0, inf, nan) with correct behavior
🔬 Core Algorithm (Ψ-Hyperbasis)
// Core rational transformation
Ψ(x) = x / (1 + (3/8)x²)
// sin(x) approximation
sin(x) ≈ Ψ ⋅ (1 - a1⋅Ψ² + a2⋅Ψ⁴ - a3⋅Ψ⁶)
// cos(x) approximation
cos(x) ≈ 1 - b1⋅Ψ² + b2⋅Ψ⁴ - b3⋅Ψ⁶
This allows both functions to share a unified base, optimizing performance and memory access.
📊 Public API
#include "fabe13/fabe13.h"
// Scalar API
double fabe13_sin(double x);
double fabe13_cos(double x);
double fabe13_sinc(double x); // sin(x)/x
double fabe13_tan(double x);
double fabe13_cot(double x);
double fabe13_atan(double x);
double fabe13_asin(double x); // [-1, 1]
double fabe13_acos(double x); // [-1, 1]
// SIMD vector API
void fabe13_sincos(const double* in, double* sin_out, double* cos_out, int n);
🧠 Design Highlights
- ✅ Branchless Quadrant Correction
- ✅ NaN/Inf/0-safe logic
- ✅ Prefetch-friendly & unrolled scalar fallback
- ✅ SIMD-ready backend design (NEON / AVX2 / AVX512)
- ✅ Precision-preserving range reduction
🔭 Future Development Roadmap
- [ ] Extended SIMD Ψ-Hyperbasis implementation (AVX2 / NEON / AVX512)
- [ ] Additional functions:
cosm1,expm1,log1pwith Ψ-Hyperbasis optimization - [ ] Single-precision
float32support (fabe13_sinf, etc.) - [ ] Ultra-fast LUT-based variants for performance-critical applications
- [ ] Language bindings for Python, Rust, and C++
- [ ] Documentation and examples for common use cases
📜 License
MIT License © 2025 Faruk Alpay
See LICENSE
🧬 Author
Faruk Alpay
https://Frontier2075.com
https://lightcap.ai
FABE13-HX is part of the Lightcap Initiative — building the most precise and elegant math primitives in open source.
