Kofft

High-performance no_std Rust DSP library with FFT, DCT, STFT, Wavelet & more. SIMD-optimized, zero-allocation, and MCU-friendly.

Generate Convert Improve

Install / Use

/learn @okian/Kofft

About this skill

Quality Score

0/100

README

kofft

High-performance, no_std, MCU-friendly DSP library featuring FFT, DCT, DST, Hartley, Wavelet, STFT, and more. Stack-only, SIMD-optimized, and batch transforms for embedded and scientific Rust applications.

Features

🚀 Zero-allocation stack-only APIs for MCU/embedded systems
⚡ SIMD acceleration (x86_64 AVX2 & SSE, AArch64 NEON, WebAssembly SIMD)
🧮 Split-radix FFTs for power-of-two sizes, with radix-2/4 and mixed-radix support
🔧 Multiple transform types and modules: FFT, NDFFT (n-dimensional), DCT (Types I-IV), DST (Types I-IV), Hartley, Hilbert transform, Cepstrum, Wavelet, STFT, CZT, Goertzel
📊 Window functions: Hann, Hamming, Blackman, Kaiser
🔄 Batch and multi-channel processing
🌐 WebAssembly support
📱 Parallel processing (optional)
🎵 Hybrid song identification: fast metadata lookup with BLAKE3 fallback

Benchmarks

See benchmarks for detailed benchmark results and data.

Quick Start

Add to Cargo.toml

[dependencies]
kofft = { version = "0.1.5", features = [
    # "x86_64",             # AVX/SSE on x86_64
    # "sse",                # force SSE2-only backend
    # "aarch64",            # NEON on 64-bit ARM
    # "wasm",               # WebAssembly SIMD128
    # "avx2",               # AVX2-specific code paths
    # "avx512",             # AVX-512 code paths
    # "parallel",           # Rayon-based parallel helpers
    # "simd",               # portable SIMD FFT implementations
    # "soa",                # structure-of-arrays complex vectors
    # "precomputed-twiddles", # embed precomputed twiddle factors (requires std)
    # "compile-time-rfft",  # precompute real FFT tables at compile time
    # "slow",               # include naive reference algorithms
    # "internal-tests",     # enable proptest/rand for internal tests
] }

Basic Usage

For an overview of the Fast Fourier Transform (FFT), see Wikipedia.

use kofft::{Complex32, FftPlanner};
use kofft::fft::{ScalarFftImpl, FftImpl};

// Create FFT instance with planner (caches twiddle factors)
let planner = FftPlanner::<f32>::new();
let fft = ScalarFftImpl::with_planner(planner);

// Prepare data
let mut data = vec![
    Complex32::new(1.0, 0.0),
    Complex32::new(2.0, 0.0),
    Complex32::new(3.0, 0.0),
    Complex32::new(4.0, 0.0),
];

// Compute FFT
fft.fft(&mut data)?;

// Compute inverse FFT
fft.ifft(&mut data)?;

Parallel FFT

Enable the parallel feature to automatically split large transforms across threads via Rayon. Use the fft_parallel and ifft_parallel helpers which safely fall back to single-threaded execution when Rayon is not available.

By default, kofft parallelizes an FFT when each CPU core would process at least max(L1_cache_bytes / size_of::<Complex32>(), per_core_work) elements. The defaults assume a 32 KiB L1 cache and require roughly 4,096 points per core. The heuristic scales with the number of detected cores (via num_cpus) and can be tuned using the KOFFT_PAR_FFT_THRESHOLD, KOFFT_PAR_FFT_CACHE_BYTES, or KOFFT_PAR_FFT_PER_CORE_WORK environment variables, or by calling kofft::fft::set_parallel_fft_threshold, set_parallel_fft_l1_cache, or set_parallel_fft_per_core_work at runtime.

use kofft::fft::{fft_parallel, ifft_parallel, Complex32};

let mut data = vec![Complex32::new(1.0, 0.0); 1 << 14];
fft_parallel(&mut data)?;
ifft_parallel(&mut data)?;

Cargo Feature Flags

The crate exposes several Cargo features. Refer to Cargo.toml for the canonical list and definitions.

std – enable the Rust standard library (default)
parallel – Rayon-based parallel helpers
Architecture backends:
- x86_64 – AVX/SSE on x86_64 CPUs
- sse – force SSE2-only backend
- aarch64 – NEON on 64-bit ARM
- wasm – WebAssembly SIMD128
- avx2 – AVX2-specific code paths
- avx512 – AVX-512 code paths
Miscellaneous:
- simd – portable SIMD FFT implementations
- soa – structure-of-arrays complex vectors for SIMD
- precomputed-twiddles – embed precomputed FFT twiddle factors (requires std)
- compile-time-rfft – generate real FFT tables at compile time
- slow – include naive reference algorithms
- internal-tests – enable proptest and rand for internal testing

Embedded/MCU Usage (No Heap)

All stack-only APIs require you to provide output buffers. This enables no_std operation without any heap allocation.

FFT (Stack-Only)

use kofft::fft::{Complex32, fft_inplace_stack};

// 8-point FFT (power-of-two only for stack APIs)
let mut buf: [Complex32; 8] = [
    Complex32::new(1.0, 0.0), Complex32::new(2.0, 0.0),
    Complex32::new(3.0, 0.0), Complex32::new(4.0, 0.0),
    Complex32::new(5.0, 0.0), Complex32::new(6.0, 0.0),
    Complex32::new(7.0, 0.0), Complex32::new(8.0, 0.0),
];

fft_inplace_stack(&mut buf)?;

DCT-I (Stack-Only)

use kofft::dct::dct1_inplace_stack;

let input: [f32; 8] = [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0];
let mut output: [f32; 8] = [0.0; 8];

dct1_inplace_stack(&input, &mut output);

DCT-II (Stack-Only)

use kofft::dct::dct2_inplace_stack;

let input: [f32; 8] = [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0];
let mut output: [f32; 8] = [0.0; 8];

dct2_inplace_stack(&input, &mut output);

DST-II (Stack-Only)

use kofft::dst::dst2_inplace_stack;

let input: [f32; 8] = [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0];
let mut output: [f32; 8] = [0.0; 8];

dst2_inplace_stack(&input, &mut output);

DST-IV (Stack-Only)

use kofft::dst::dst4_inplace_stack;

let input: [f32; 8] = [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0];
let mut output: [f32; 8] = [0.0; 8];

dst4_inplace_stack(&input, &mut output);

Haar Wavelet (Stack-Only)

use kofft::wavelet::{haar_forward_inplace_stack, haar_inverse_inplace_stack};

// Forward transform
let input: [f32; 8] = [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0];
let mut avg = [0.0; 4];
let mut diff = [0.0; 4];

haar_forward_inplace_stack(&input, &mut avg[..], &mut diff[..]);

// Inverse transform
let mut out = [0.0; 8];
haar_inverse_inplace_stack(&avg[..], &diff[..], &mut out[..]);

Window Functions (Stack-Only)

use kofft::window::{hann_inplace_stack, hamming_inplace_stack, blackman_inplace_stack};

let mut hann: [f32; 8] = [0.0; 8];
hann_inplace_stack(&mut hann);

let mut hamming: [f32; 8] = [0.0; 8];
hamming_inplace_stack(&mut hamming);

let mut blackman: [f32; 8] = [0.0; 8];
blackman_inplace_stack(&mut blackman);

Sanity Check Utility

The workspace provides a sanity-check binary for comparing spectrograms between kofft and rustfft. It can optionally emit an SVG file using --svg-output:

cargo run -r -p sanity-check -- input.flac --svg-output=spec.svg

Desktop/Standard Library Usage

With the std feature (enabled by default), you get heap-based APIs for more flexibility.

FFT with Standard Library

use kofft::fft::{Complex32, ScalarFftImpl, FftImpl};

let fft = ScalarFftImpl::<f32>::default();

// Heap-based FFT
let mut data = vec![
    Complex32::new(1.0, 0.0),
    Complex32::new(2.0, 0.0),
    Complex32::new(3.0, 0.0),
    Complex32::new(4.0, 0.0),
];

fft.fft(&mut data)?;

// Or create new vector
let result = fft.fft_vec(&data)?;

Real FFT (Optimized for Real Input)

use kofft::fft::{ScalarFftImpl, FftImpl};
use kofft::rfft::RealFftImpl;

let fft = ScalarFftImpl::<f32>::default();
let mut input = vec![1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0];
let mut output = vec![Complex32::zero(); input.len() / 2 + 1];

fft.rfft(&mut input, &mut output)?;

Stack-only helpers avoid heap allocation:

use kofft::rfft::{irfft_stack, rfft_stack};
use kofft::Complex32;

let input = [1.0f32, 2.0, 3.0, 4.0];
let mut freq = [Complex32::new(0.0, 0.0); 3];
rfft_stack(&input, &mut freq)?;
let mut time = [0.0f32; 4];
irfft_stack(&freq, &mut time)?;

STFT (Short-Time Fourier Transform)

For background on STFT, see Wikipedia.

use kofft::stft::{stft, istft};
use kofft::window::hann;
use kofft::fft::ScalarFftImpl;

let signal = vec![1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0];
let window = hann(4);
let hop_size = 2;
let fft = ScalarFftImpl::<f32>::default();

let mut frames = vec![vec![]; (signal.len() + hop_size - 1) / hop_size];
stft(&signal, &window, hop_size, &mut frames, &fft)?;

let mut output = vec![0.0; signal.len()];
let mut scratch = vec![0.0; output.len()];
istft(&mut frames, &window, hop_size, &mut output, &mut scratch, &fft)?;

Streaming STFT/ISTFT

use kofft::stft::{StftStream, istft};
use kofft::window::hann;
use kofft::fft::{Complex32, ScalarFftImpl};

let signal = vec![1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0];
let window = hann(4);
let hop_size = 2;
let fft = ScalarFftImpl::<f32>::default();
let mut stream = StftStream::new(&signal, &window, hop_size, &fft)?;
let mut frames = Vec::new();
let mut frame = vec![Complex32::new(0.0, 0.0); window.len()];
while stream.next_frame(&mut frame)? {
    frames.push(frame.clone());
}
let mut output = vec![0.0; signal.len()];
let mut scratch = vec![0.0; output.len()];
istft(&mut frames, &window, hop_size, &mut output, &mut scratch, &fft)?;

Batch Processing

use kofft::fft::{

Related Skills

himalaya

345.4k

CLI to manage emails via IMAP/SMTP. Use `himalaya` to list, read, write, reply, forward, search, and organize emails from the terminal. Supports multiple accounts and message composition with MML (MIME Meta Language).

node-connect

345.4k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

104.6k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

coding-agent

345.4k

Delegate coding tasks to Codex, Claude Code, or Pi agents via background process