Kofft
High-performance no_std Rust DSP library with FFT, DCT, STFT, Wavelet & more. SIMD-optimized, zero-allocation, and MCU-friendly.
Install / Use
/learn @okian/KofftREADME
kofft
High-performance, no_std, MCU-friendly DSP library featuring FFT, DCT, DST, Hartley, Wavelet, STFT, and more. Stack-only, SIMD-optimized, and batch transforms for embedded and scientific Rust applications.
Features
- 🚀 Zero-allocation stack-only APIs for MCU/embedded systems
- ⚡ SIMD acceleration (x86_64 AVX2 & SSE, AArch64 NEON, WebAssembly SIMD)
- 🧮 Split-radix FFTs for power-of-two sizes, with radix-2/4 and mixed-radix support
- 🔧 Multiple transform types and modules: FFT, NDFFT (n-dimensional), DCT (Types I-IV), DST (Types I-IV), Hartley, Hilbert transform, Cepstrum, Wavelet, STFT, CZT, Goertzel
- 📊 Window functions: Hann, Hamming, Blackman, Kaiser
- 🔄 Batch and multi-channel processing
- 🌐 WebAssembly support
- 📱 Parallel processing (optional)
- 🎵 Hybrid song identification: fast metadata lookup with BLAKE3 fallback
Benchmarks
See benchmarks for detailed benchmark results and data.
Quick Start
Add to Cargo.toml
[dependencies]
kofft = { version = "0.1.5", features = [
# "x86_64", # AVX/SSE on x86_64
# "sse", # force SSE2-only backend
# "aarch64", # NEON on 64-bit ARM
# "wasm", # WebAssembly SIMD128
# "avx2", # AVX2-specific code paths
# "avx512", # AVX-512 code paths
# "parallel", # Rayon-based parallel helpers
# "simd", # portable SIMD FFT implementations
# "soa", # structure-of-arrays complex vectors
# "precomputed-twiddles", # embed precomputed twiddle factors (requires std)
# "compile-time-rfft", # precompute real FFT tables at compile time
# "slow", # include naive reference algorithms
# "internal-tests", # enable proptest/rand for internal tests
] }
Basic Usage
For an overview of the Fast Fourier Transform (FFT), see Wikipedia.
use kofft::{Complex32, FftPlanner};
use kofft::fft::{ScalarFftImpl, FftImpl};
// Create FFT instance with planner (caches twiddle factors)
let planner = FftPlanner::<f32>::new();
let fft = ScalarFftImpl::with_planner(planner);
// Prepare data
let mut data = vec![
Complex32::new(1.0, 0.0),
Complex32::new(2.0, 0.0),
Complex32::new(3.0, 0.0),
Complex32::new(4.0, 0.0),
];
// Compute FFT
fft.fft(&mut data)?;
// Compute inverse FFT
fft.ifft(&mut data)?;
Parallel FFT
Enable the parallel feature to automatically split large transforms across
threads via Rayon. Use the fft_parallel and
ifft_parallel helpers which safely fall back to single-threaded execution when
Rayon is not available.
By default, kofft parallelizes an FFT when each CPU core would process at least
max(L1_cache_bytes / size_of::<Complex32>(), per_core_work) elements. The
defaults assume a 32 KiB L1 cache and require roughly 4,096 points per core.
The heuristic scales with the number of detected cores (via
num_cpus) and can be tuned using the
KOFFT_PAR_FFT_THRESHOLD, KOFFT_PAR_FFT_CACHE_BYTES, or
KOFFT_PAR_FFT_PER_CORE_WORK environment variables, or by calling
kofft::fft::set_parallel_fft_threshold, set_parallel_fft_l1_cache, or
set_parallel_fft_per_core_work at runtime.
use kofft::fft::{fft_parallel, ifft_parallel, Complex32};
let mut data = vec![Complex32::new(1.0, 0.0); 1 << 14];
fft_parallel(&mut data)?;
ifft_parallel(&mut data)?;
Cargo Feature Flags
The crate exposes several Cargo features. Refer to Cargo.toml for the canonical list and definitions.
std– enable the Rust standard library (default)parallel– Rayon-based parallel helpers- Architecture backends:
x86_64– AVX/SSE on x86_64 CPUssse– force SSE2-only backendaarch64– NEON on 64-bit ARMwasm– WebAssembly SIMD128avx2– AVX2-specific code pathsavx512– AVX-512 code paths
- Miscellaneous:
simd– portable SIMD FFT implementationssoa– structure-of-arrays complex vectors for SIMDprecomputed-twiddles– embed precomputed FFT twiddle factors (requiresstd)compile-time-rfft– generate real FFT tables at compile timeslow– include naive reference algorithmsinternal-tests– enable proptest and rand for internal testing
Embedded/MCU Usage (No Heap)
All stack-only APIs require you to provide output buffers. This enables no_std operation without any heap allocation.
FFT (Stack-Only)
use kofft::fft::{Complex32, fft_inplace_stack};
// 8-point FFT (power-of-two only for stack APIs)
let mut buf: [Complex32; 8] = [
Complex32::new(1.0, 0.0), Complex32::new(2.0, 0.0),
Complex32::new(3.0, 0.0), Complex32::new(4.0, 0.0),
Complex32::new(5.0, 0.0), Complex32::new(6.0, 0.0),
Complex32::new(7.0, 0.0), Complex32::new(8.0, 0.0),
];
fft_inplace_stack(&mut buf)?;
DCT-I (Stack-Only)
use kofft::dct::dct1_inplace_stack;
let input: [f32; 8] = [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0];
let mut output: [f32; 8] = [0.0; 8];
dct1_inplace_stack(&input, &mut output);
DCT-II (Stack-Only)
use kofft::dct::dct2_inplace_stack;
let input: [f32; 8] = [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0];
let mut output: [f32; 8] = [0.0; 8];
dct2_inplace_stack(&input, &mut output);
DST-II (Stack-Only)
use kofft::dst::dst2_inplace_stack;
let input: [f32; 8] = [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0];
let mut output: [f32; 8] = [0.0; 8];
dst2_inplace_stack(&input, &mut output);
DST-IV (Stack-Only)
use kofft::dst::dst4_inplace_stack;
let input: [f32; 8] = [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0];
let mut output: [f32; 8] = [0.0; 8];
dst4_inplace_stack(&input, &mut output);
Haar Wavelet (Stack-Only)
use kofft::wavelet::{haar_forward_inplace_stack, haar_inverse_inplace_stack};
// Forward transform
let input: [f32; 8] = [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0];
let mut avg = [0.0; 4];
let mut diff = [0.0; 4];
haar_forward_inplace_stack(&input, &mut avg[..], &mut diff[..]);
// Inverse transform
let mut out = [0.0; 8];
haar_inverse_inplace_stack(&avg[..], &diff[..], &mut out[..]);
Window Functions (Stack-Only)
use kofft::window::{hann_inplace_stack, hamming_inplace_stack, blackman_inplace_stack};
let mut hann: [f32; 8] = [0.0; 8];
hann_inplace_stack(&mut hann);
let mut hamming: [f32; 8] = [0.0; 8];
hamming_inplace_stack(&mut hamming);
let mut blackman: [f32; 8] = [0.0; 8];
blackman_inplace_stack(&mut blackman);
Sanity Check Utility
The workspace provides a sanity-check binary for comparing spectrograms
between kofft and rustfft. It can optionally emit an SVG file using
--svg-output:
cargo run -r -p sanity-check -- input.flac --svg-output=spec.svg
Desktop/Standard Library Usage
With the std feature (enabled by default), you get heap-based APIs for more flexibility.
FFT with Standard Library
use kofft::fft::{Complex32, ScalarFftImpl, FftImpl};
let fft = ScalarFftImpl::<f32>::default();
// Heap-based FFT
let mut data = vec![
Complex32::new(1.0, 0.0),
Complex32::new(2.0, 0.0),
Complex32::new(3.0, 0.0),
Complex32::new(4.0, 0.0),
];
fft.fft(&mut data)?;
// Or create new vector
let result = fft.fft_vec(&data)?;
Real FFT (Optimized for Real Input)
use kofft::fft::{ScalarFftImpl, FftImpl};
use kofft::rfft::RealFftImpl;
let fft = ScalarFftImpl::<f32>::default();
let mut input = vec![1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0];
let mut output = vec![Complex32::zero(); input.len() / 2 + 1];
fft.rfft(&mut input, &mut output)?;
Stack-only helpers avoid heap allocation:
use kofft::rfft::{irfft_stack, rfft_stack};
use kofft::Complex32;
let input = [1.0f32, 2.0, 3.0, 4.0];
let mut freq = [Complex32::new(0.0, 0.0); 3];
rfft_stack(&input, &mut freq)?;
let mut time = [0.0f32; 4];
irfft_stack(&freq, &mut time)?;
STFT (Short-Time Fourier Transform)
For background on STFT, see Wikipedia.
use kofft::stft::{stft, istft};
use kofft::window::hann;
use kofft::fft::ScalarFftImpl;
let signal = vec![1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0];
let window = hann(4);
let hop_size = 2;
let fft = ScalarFftImpl::<f32>::default();
let mut frames = vec![vec![]; (signal.len() + hop_size - 1) / hop_size];
stft(&signal, &window, hop_size, &mut frames, &fft)?;
let mut output = vec![0.0; signal.len()];
let mut scratch = vec![0.0; output.len()];
istft(&mut frames, &window, hop_size, &mut output, &mut scratch, &fft)?;
Streaming STFT/ISTFT
use kofft::stft::{StftStream, istft};
use kofft::window::hann;
use kofft::fft::{Complex32, ScalarFftImpl};
let signal = vec![1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0];
let window = hann(4);
let hop_size = 2;
let fft = ScalarFftImpl::<f32>::default();
let mut stream = StftStream::new(&signal, &window, hop_size, &fft)?;
let mut frames = Vec::new();
let mut frame = vec![Complex32::new(0.0, 0.0); window.len()];
while stream.next_frame(&mut frame)? {
frames.push(frame.clone());
}
let mut output = vec![0.0; signal.len()];
let mut scratch = vec![0.0; output.len()];
istft(&mut frames, &window, hop_size, &mut output, &mut scratch, &fft)?;
Batch Processing
use kofft::fft::{
Related Skills
himalaya
345.4kCLI to manage emails via IMAP/SMTP. Use `himalaya` to list, read, write, reply, forward, search, and organize emails from the terminal. Supports multiple accounts and message composition with MML (MIME Meta Language).
node-connect
345.4kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
104.6kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
coding-agent
345.4kDelegate coding tasks to Codex, Claude Code, or Pi agents via background process
