SkillAgentSearch skills...

PhastFT

A high-performance, "quantum-inspired" Fast Fourier Transform (FFT) library written in pure and safe Rust.

Install / Use

/learn @QuState/PhastFT
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

Build codecov unsafe forbidden

PhastFT

PhastFT is a high-performance, "quantum-inspired" Fast Fourier Transform (FFT) library written in safe Rust.

Designed for large FFTs (gigabytes of data) common in scientific workloads, e.g. in quantum computer simulators.

Features

  • Performance competitive with other Rust implementations in a single thread, and much faster when multi-threaded
  • In-place algorithm for low memory usage, allows running large FFTs on cheaper hardware
  • Simple implementation using the Cooley-Tukey FFT algorithm and CO-BRAVO bit reversal
  • No unsafe code
  • Optional multi-threading
  • SIMD acceleration on SSE4.2, AVX2, NEON and WASM thanks to fearless_simd
  • Selects the fastest SIMD implementation at runtime. No need for -C target-cpu=native!
  • Coming soon: Python bindings (via PyO3)

Limitations

  • Only supports input with a length of 2^n (i.e., a power of 2) -- input should be padded with zeros to the next power of 2

Planned features

  • Real-to-complex FFT
  • Additional agorithms for non-power-of-2 FFTs
  • Even more work on performance

Quickstart

Rust

use phastft::{fft_64_dit, fft_64_dit_with_planner, planner::{Direction, PlannerDit64}};

let big_n = 1 << 20;
let mut reals: Vec<f64> = (1..=big_n).map(|i| i as f64).collect();
let mut imags: Vec<f64> = (1..=big_n).map(|i| i as f64).collect();

// Simple API
fft_64_dit(&mut reals, &mut imags, Direction::Forward);

// Or with a reusable planner for better performance with multiple FFTs
let planner = PlannerDit64::new(big_n, Direction::Forward);
fft_64_dit_with_planner(&mut reals, &mut imags, &planner);

Complex Number Support (Interleaved Format)

When the complex-nums feature is enabled, you can also use the interleaved format with the num_complex::Complex type:

use phastft::{
    planner::Direction,
    fft_64_interleaved
};
use num_complex::Complex;

let big_n = 1 << 10;
let mut signal: Vec<Complex<f64>> = (1..=big_n)
    .map(|i| Complex::new(i as f64, i as f64))
    .collect();
fft_64_interleaved(&mut signal, Direction::Forward);

Both fft_32_interleaved and fft_64_interleaved are available for f32 and f64 precision respectively.

Python (coming soon)

Follow the instructions at https://rustup.rs/ to install Rust.

Then you can install PhastFT itself:

pip install numpy
pip install git+https://github.com/QuState/PhastFT#subdirectory=pyphastft
import numpy as np
from pyphastft import fft

sig_re = np.asarray(sig_re, dtype=np.float64)
sig_im = np.asarray(sig_im, dtype=np.float64)

fft(a_re, a_im)

Normalization

phastft only scales the output of the inverse FFT. Namely, running IFFT(x) will scale each element by 1/N, where N is the number of data points, and IFFT(FFT(x)) == x. If your use case(s) require(s) something different, please don't hesitate to create an issue.

Bit Reversal and Output Order

  • Input: Normal order (bit-reversed internally)
  • Output: Normal order
  • Bit Reversal: Always performed on input (required for correctness)

Performance Notes

Reuse planners. If you're doing multiple FFTs of the same size, create the planner once and reuse it.

Threading. If the parallel feature is enabled, the library uses 2 threads for bit reversal and all available threads for FFT calculation when it seems profitable.

Complex numbers. The separate real/imaginary array API is faster and uses less memory than the interleaved complex API which allocates temporary buffers.

Benchmarks

PhastFT is benchmarked against several other FFT libraries. Scripts and instructions to reproduce benchmark results and plots are available here.

<p align="center"> <img src="https://raw.githubusercontent.com/QuState/PhastFT/main/assets/benchmarks_bar_plot_4_12.png" width="400" title="PhastFT vs. RustFFT vs. FFTW3" alt="PhastFT vs. RustFFT vs. FFTW3"> <img src="https://raw.githubusercontent.com/QuState/PhastFT/main/assets/benchmarks_bar_plot_13_29.png" width="400" title="PhastFT vs. RustFFT vs. FFTW3" alt="PhastFT vs. RustFFT vs. FFTW3"> </p> <p align="center"> <img src="https://raw.githubusercontent.com/QuState/PhastFT/main/assets/py_benchmarks_bar_plot_0_8.png" width="400" title="PhastFT vs. NumPy FFT vs. pyFFTW" alt="PhastFT vs. NumPy FFT vs. pyFFTW"> <img src="https://raw.githubusercontent.com/QuState/PhastFT/main/assets/py_benchmarks_bar_plot_9_28.png" width="400" title="PhastFT vs. NumPy FFT vs. pyFFTW" alt="PhastFT vs. NumPy FFT vs. pyFFTW"> </p>

How is it so fast?

PhastFT is designed around the capabilities and limitations of modern hardware (that is, anything made in the last 10 years or so).

The two major bottlenecks in FFT are the CPU cycles and memory accesses.

Most literature on FFT focuses on reducing the amount of math operations, but today's CPUs are heavily memory-bottlenecked for any amount of data that doesn't fit into the cache. It doesn't matter how much or how little CPU instructions you need to execute if the CPU spends most of the time just waiting on memory anyway!

Notes on FFTs for implementers is a good read if you want to understand the trade-offs on modern hardware. Its author is not affiliated with PhastFT.

The trade-offs we chose are:

  • In-place FFT with a separate bit-reversal step reduces memory traffic and peak memory usage compared to out-of-place and auto-sorter FFTs
  • Radix-2 Cooley-Turkey FFT: radix-4 and split-radix do less math, but require complex and slow bit reversals.
    • We still need to experiment with fusing multiple radix-2 passes to reduce memory traffic in single-threaded scenarios
  • CO-BRAVO cache-optimal, SIMD-accelerated bit reversal trounces other algorithms.
  • Decimation in time maps better to SIMD fused multiply-adds than decimation-in-frequency, and CO-BRAVO makes skipping bit reversal less appealing.
  • Recursive formulation enables cache-oblivious FFT and easy parallelism. We switch over to a loop when reaching L1 cache size.

All of this combined results in a fast and efficient FFT implementation competitive with the performance of existing Rust FFT crates on medium to large sizes, while using significantly less memory.

Contributing

Contributions to PhastFT are welcome! If you find any issues or have improvements to suggest, please open an issue or submit a pull request. Follow the contribution guidelines outlined in the CONTRIBUTING.md file.

License

PhastFT is licensed under MIT or Apache 2.0 license, at your option.

PhastFT vs. RustFFT

RustFFT is another excellent FFT implementation in pure Rust. RustFFT and PhastFT make different trade-offs.

PhastFT advantages

  • Up to 2x lower memory usage, letting you use laptops or cheaper cloud instances for large FFTs
  • Multi-threading support, much higher performance on large sizes when using multi-threading
  • No unsafe code

RustFFT advantages

  • Higher performance for small sizes thanks to dedicated handwritten kernels for each size
  • Supports FFT sizes that aren't powers of 2 (with a large performance penalty)

What's with the name?

The name, PhastFT, is derived from the implementation of the Quantum Fourier Transform (QFT). Namely, the quantum circuit implementation of QFT consists of the Phase gates and Hadamard gates. Hence, PhastFT.

View on GitHub
GitHub Stars257
CategoryDevelopment
Updated8h ago
Forks13

Languages

Rust

Security Score

100/100

Audited on Mar 24, 2026

No findings