SkillAgentSearch skills...

Kofft

High-performance no_std Rust DSP library with FFT, DCT, STFT, Wavelet & more. SIMD-optimized, zero-allocation, and MCU-friendly.

Install / Use

/learn @okian/Kofft
About this skill

Quality Score

0/100

Supported Platforms

Zed

README

kofft

Crates.io Documentation License Rust Version codecov

High-performance, no_std, MCU-friendly DSP library featuring FFT, DCT, DST, Hartley, Wavelet, STFT, and more. Stack-only, SIMD-optimized, and batch transforms for embedded and scientific Rust applications.

Features

  • 🚀 Zero-allocation stack-only APIs for MCU/embedded systems
  • ⚡ SIMD acceleration (x86_64 AVX2 & SSE, AArch64 NEON, WebAssembly SIMD)
  • 🧮 Split-radix FFTs for power-of-two sizes, with radix-2/4 and mixed-radix support
  • 🔧 Multiple transform types and modules: FFT, NDFFT (n-dimensional), DCT (Types I-IV), DST (Types I-IV), Hartley, Hilbert transform, Cepstrum, Wavelet, STFT, CZT, Goertzel
  • 📊 Window functions: Hann, Hamming, Blackman, Kaiser
  • 🔄 Batch and multi-channel processing
  • 🌐 WebAssembly support
  • 📱 Parallel processing (optional)
  • 🎵 Hybrid song identification: fast metadata lookup with BLAKE3 fallback

Benchmarks

See benchmarks for detailed benchmark results and data.

Quick Start

Add to Cargo.toml

[dependencies]
kofft = { version = "0.1.5", features = [
    # "x86_64",             # AVX/SSE on x86_64
    # "sse",                # force SSE2-only backend
    # "aarch64",            # NEON on 64-bit ARM
    # "wasm",               # WebAssembly SIMD128
    # "avx2",               # AVX2-specific code paths
    # "avx512",             # AVX-512 code paths
    # "parallel",           # Rayon-based parallel helpers
    # "simd",               # portable SIMD FFT implementations
    # "soa",                # structure-of-arrays complex vectors
    # "precomputed-twiddles", # embed precomputed twiddle factors (requires std)
    # "compile-time-rfft",  # precompute real FFT tables at compile time
    # "slow",               # include naive reference algorithms
    # "internal-tests",     # enable proptest/rand for internal tests
] }

Basic Usage

For an overview of the Fast Fourier Transform (FFT), see Wikipedia.

use kofft::{Complex32, FftPlanner};
use kofft::fft::{ScalarFftImpl, FftImpl};

// Create FFT instance with planner (caches twiddle factors)
let planner = FftPlanner::<f32>::new();
let fft = ScalarFftImpl::with_planner(planner);

// Prepare data
let mut data = vec![
    Complex32::new(1.0, 0.0),
    Complex32::new(2.0, 0.0),
    Complex32::new(3.0, 0.0),
    Complex32::new(4.0, 0.0),
];

// Compute FFT
fft.fft(&mut data)?;

// Compute inverse FFT
fft.ifft(&mut data)?;

Parallel FFT

Enable the parallel feature to automatically split large transforms across threads via Rayon. Use the fft_parallel and ifft_parallel helpers which safely fall back to single-threaded execution when Rayon is not available.

By default, kofft parallelizes an FFT when each CPU core would process at least max(L1_cache_bytes / size_of::<Complex32>(), per_core_work) elements. The defaults assume a 32 KiB L1 cache and require roughly 4,096 points per core. The heuristic scales with the number of detected cores (via num_cpus) and can be tuned using the KOFFT_PAR_FFT_THRESHOLD, KOFFT_PAR_FFT_CACHE_BYTES, or KOFFT_PAR_FFT_PER_CORE_WORK environment variables, or by calling kofft::fft::set_parallel_fft_threshold, set_parallel_fft_l1_cache, or set_parallel_fft_per_core_work at runtime.

use kofft::fft::{fft_parallel, ifft_parallel, Complex32};

let mut data = vec![Complex32::new(1.0, 0.0); 1 << 14];
fft_parallel(&mut data)?;
ifft_parallel(&mut data)?;

Cargo Feature Flags

The crate exposes several Cargo features. Refer to Cargo.toml for the canonical list and definitions.

  • std – enable the Rust standard library (default)
  • parallel – Rayon-based parallel helpers
  • Architecture backends:
    • x86_64 – AVX/SSE on x86_64 CPUs
    • sse – force SSE2-only backend
    • aarch64 – NEON on 64-bit ARM
    • wasm – WebAssembly SIMD128
    • avx2 – AVX2-specific code paths
    • avx512 – AVX-512 code paths
  • Miscellaneous:
    • simd – portable SIMD FFT implementations
    • soa – structure-of-arrays complex vectors for SIMD
    • precomputed-twiddles – embed precomputed FFT twiddle factors (requires std)
    • compile-time-rfft – generate real FFT tables at compile time
    • slow – include naive reference algorithms
    • internal-tests – enable proptest and rand for internal testing

Embedded/MCU Usage (No Heap)

All stack-only APIs require you to provide output buffers. This enables no_std operation without any heap allocation.

FFT (Stack-Only)

use kofft::fft::{Complex32, fft_inplace_stack};

// 8-point FFT (power-of-two only for stack APIs)
let mut buf: [Complex32; 8] = [
    Complex32::new(1.0, 0.0), Complex32::new(2.0, 0.0),
    Complex32::new(3.0, 0.0), Complex32::new(4.0, 0.0),
    Complex32::new(5.0, 0.0), Complex32::new(6.0, 0.0),
    Complex32::new(7.0, 0.0), Complex32::new(8.0, 0.0),
];

fft_inplace_stack(&mut buf)?;

DCT-I (Stack-Only)

use kofft::dct::dct1_inplace_stack;

let input: [f32; 8] = [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0];
let mut output: [f32; 8] = [0.0; 8];

dct1_inplace_stack(&input, &mut output);

DCT-II (Stack-Only)

use kofft::dct::dct2_inplace_stack;

let input: [f32; 8] = [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0];
let mut output: [f32; 8] = [0.0; 8];

dct2_inplace_stack(&input, &mut output);

DST-II (Stack-Only)

use kofft::dst::dst2_inplace_stack;

let input: [f32; 8] = [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0];
let mut output: [f32; 8] = [0.0; 8];

dst2_inplace_stack(&input, &mut output);

DST-IV (Stack-Only)

use kofft::dst::dst4_inplace_stack;

let input: [f32; 8] = [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0];
let mut output: [f32; 8] = [0.0; 8];

dst4_inplace_stack(&input, &mut output);

Haar Wavelet (Stack-Only)

use kofft::wavelet::{haar_forward_inplace_stack, haar_inverse_inplace_stack};

// Forward transform
let input: [f32; 8] = [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0];
let mut avg = [0.0; 4];
let mut diff = [0.0; 4];

haar_forward_inplace_stack(&input, &mut avg[..], &mut diff[..]);

// Inverse transform
let mut out = [0.0; 8];
haar_inverse_inplace_stack(&avg[..], &diff[..], &mut out[..]);

Window Functions (Stack-Only)

use kofft::window::{hann_inplace_stack, hamming_inplace_stack, blackman_inplace_stack};

let mut hann: [f32; 8] = [0.0; 8];
hann_inplace_stack(&mut hann);

let mut hamming: [f32; 8] = [0.0; 8];
hamming_inplace_stack(&mut hamming);

let mut blackman: [f32; 8] = [0.0; 8];
blackman_inplace_stack(&mut blackman);

Sanity Check Utility

The workspace provides a sanity-check binary for comparing spectrograms between kofft and rustfft. It can optionally emit an SVG file using --svg-output:

cargo run -r -p sanity-check -- input.flac --svg-output=spec.svg

Desktop/Standard Library Usage

With the std feature (enabled by default), you get heap-based APIs for more flexibility.

FFT with Standard Library

use kofft::fft::{Complex32, ScalarFftImpl, FftImpl};

let fft = ScalarFftImpl::<f32>::default();

// Heap-based FFT
let mut data = vec![
    Complex32::new(1.0, 0.0),
    Complex32::new(2.0, 0.0),
    Complex32::new(3.0, 0.0),
    Complex32::new(4.0, 0.0),
];

fft.fft(&mut data)?;

// Or create new vector
let result = fft.fft_vec(&data)?;

Real FFT (Optimized for Real Input)

use kofft::fft::{ScalarFftImpl, FftImpl};
use kofft::rfft::RealFftImpl;

let fft = ScalarFftImpl::<f32>::default();
let mut input = vec![1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0];
let mut output = vec![Complex32::zero(); input.len() / 2 + 1];

fft.rfft(&mut input, &mut output)?;

Stack-only helpers avoid heap allocation:

use kofft::rfft::{irfft_stack, rfft_stack};
use kofft::Complex32;

let input = [1.0f32, 2.0, 3.0, 4.0];
let mut freq = [Complex32::new(0.0, 0.0); 3];
rfft_stack(&input, &mut freq)?;
let mut time = [0.0f32; 4];
irfft_stack(&freq, &mut time)?;

STFT (Short-Time Fourier Transform)

For background on STFT, see Wikipedia.

use kofft::stft::{stft, istft};
use kofft::window::hann;
use kofft::fft::ScalarFftImpl;

let signal = vec![1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0];
let window = hann(4);
let hop_size = 2;
let fft = ScalarFftImpl::<f32>::default();

let mut frames = vec![vec![]; (signal.len() + hop_size - 1) / hop_size];
stft(&signal, &window, hop_size, &mut frames, &fft)?;

let mut output = vec![0.0; signal.len()];
let mut scratch = vec![0.0; output.len()];
istft(&mut frames, &window, hop_size, &mut output, &mut scratch, &fft)?;

Streaming STFT/ISTFT

use kofft::stft::{StftStream, istft};
use kofft::window::hann;
use kofft::fft::{Complex32, ScalarFftImpl};

let signal = vec![1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0];
let window = hann(4);
let hop_size = 2;
let fft = ScalarFftImpl::<f32>::default();
let mut stream = StftStream::new(&signal, &window, hop_size, &fft)?;
let mut frames = Vec::new();
let mut frame = vec![Complex32::new(0.0, 0.0); window.len()];
while stream.next_frame(&mut frame)? {
    frames.push(frame.clone());
}
let mut output = vec![0.0; signal.len()];
let mut scratch = vec![0.0; output.len()];
istft(&mut frames, &window, hop_size, &mut output, &mut scratch, &fft)?;

Batch Processing

use kofft::fft::{

Related Skills

View on GitHub
GitHub Stars11
CategoryDevelopment
Updated25d ago
Forks0

Languages

Rust

Security Score

80/100

Audited on Mar 7, 2026

No findings