DSPLIB

A C++ DSP library with MATLAB-like syntax for readable and safe signal processing.

This is a library for those who hate this kind of code:

std::transform(x.begin(), x.end(), x.begin(), [](auto& v){
    return v * 0.3;
});

auto power = std::accumulate(x.begin() + lb, x.begin() + rb, 0.0, [](double accum, const std::complex<double>& v) {
    return accum + (v.real() * v.real() + v.imag() * v.imag());
});

auto r = std::vector<double>(x1.size());
for (int i=0; i < r.size(); ++i) {
    r[i] = x1[i] * x2[i];
}

auto p = fftw_plan_dft_1d(N, x.data(), spec.data(), FFTW_FORWARD, FFTW_ESTIMATE);
fftw_execute(p);
fftw_destroy_plan(p);

and who likes this:

using namespace dsplib;
x *= 0.3;
auto power = sum(abs2(x.slice(lb, rb)));
auto r = x1 * x2;
auto spec = fft(x);

Motivation

dsplib is, essentially, my personal collection of DSP screwdrivers.

It has been slowly assembled over several years and used across many projects. The reason is simple: a lot of C++ DSP code is painful to read, easy to misuse, and surprisingly good at reproducing the same bugs in slightly different forms.

Raw pointers in public APIs, implicit assumptions about buffer sizes, and hand-written loops tend to age poorly.

dsplib is an attempt to build a 1D DSP library with a human face: modern C++ abstractions, explicit intent, and a syntax that does not fight the person writing the code.

The library favors clarity and correctness first — and then works very hard to make it fast.

Usage

Quick Taste

// windowed FFT magnitude spectrum (dB)
auto x = randn(1024);
x *= window::hann(x.size());
auto spec = pow2db(abs2(fft(x)));

Conceptual Differences

The table below highlights key conceptual differences between dsplib and MATLAB / NumPy, beyond simple syntax similarities.

| dsplib | matlab | numpy | | ----------- | ----------- | ----------- | | x * x | x .* x | x * x | | zeros(20) | zeros(20, 1) | zeros(20) | | x.slice(0,10) = 1 | x(1:10) = 1 | x[0:10] = 1 | | x.slice(2,end) = 1 | x(3:end) = 1 | x[2:] = 1 | | x.slice(2, 4) = {1, 2} | x(3:4) = [1, 2] | x[2:4] = [1, 2] | | x.slice(0, -1) | x(1:end-1) | x[0:-1] |

Only 1-D arrays with element-wise operations are currently supported.

Scalar operations:

dsplib::real_t v1;
dsplib::cmplx_t v2;
v1 = 10;
v2 = v1;
v2 = 10-10i;
v2 = {10, -10};
v2.re = 10;
v2.im = -10;
v2 = std::complex<double>(10, -10);

Vector operations:

using namespace dsplib;
arr_real x1 = {0, 1, 2, 3, 4};
arr_cmplx x2 = {1i, 1+2i, 4, -5-5i};
x2.slice(2, 4) = {1i, 2i};
arr_real y1 = x1 * x1;
arr_real y2 = x1 * 1000;
arr_cmplx y3 = x1 * x2;
arr_cmplx y4 = x2 * 1000;
arr_cmplx y5 = x2.slice(0, 2);
arr_cmplx y6 = x1 * 2i;
arr_cmplx y7 = complex(y1); // only explicit conversion

Slicing

The behavior of slices is as close as possible to numpy. Except for cases with invalid indexes, in which case numpy does not throw an exception.

arr_real x = {0, 1, 2, 3, 4, 5, 6};
x.slice(0, 2) ///{0, 1}
x.slice(2, -1) ///{2, 3, 4, 5}
x.slice(-1, 0, -1) ///{6, 5, 4, 3, 2, 1}
x.slice(-1, 0) ///OUT_OF_RANGE, but numpy returns []
x.slice(0, -1, -1) ///OUT_OF_RANGE, but numpy returns []
x.slice(-8, 7) ///OUT_OF_RANGE, but numpy returns [0 1 2 3 4 5 6]

Fast Fourier Transform:

The FFT implementation has no radix size limitations. It supports power-of-two, prime, and semiprime radices.

The tables for the FFT are stored in the LRU cache and can be recalculated (if the pipeline uses many different bases). Use the FftPlan object to avoid this.

If your platform has a faster implementation, you can set the DSPLIB_EXCLUDE_FFT=ON option and implement the get_fft_plan functions (see the lib/fft/fftw.cpp example). You can also select the type of FFT backend via the DSPLIB_FFT_BACKEND option (dsplib, fftw, ne10[float]).

//FFT fn
arr_real x = randn(500);
arr_cmplx y1 = fft(x);  // real fft, n=500
arr_cmplx y2 = fft(x, 1024); // real fft, n=1024, zero padding
arr_cmplx y3 = fft(complex(y1)); // cmplx fft, n=500
arr_cmplx y4 = rfft(y1); // real fft, equal `fft(x)`

//FFT Plan
const int n = 512;
const arr_real x = randn(n);
std::shared_ptr<FftPlanR> plan = fft_plan_r(n);

// real fft, n=512
arr_cmplx y1 = plan->solve(x); 
//or
arr_cmplx y2 = plan->solve(x.slice(0, 512));
//or
arr_cmplx y3 = plan->solve(make_span(x.data(), n));

//real fft, n=512, result copy to `r`
arr_cmplx r(n);
plan->solve(x, r); 
//or
plan->solve(make_span(x.data(), n), make_span(r.data(), n))

//IFFT fn
const int n = 512;
arr_cmplx x = complex(ones(n));
arr_cmplx y1 = ifft(x);
//or
arr_real y2 = irfft(x.slice(0, n/2+1), n);
//or
arr_real y3 = irfft(x);

//IFFT Plan
const int n = 512;
const auto x = complex(ones(n));
auto plan = ifft_plan_r(n);

arr_real y1 = plan->solve(x);
//or
arr_real y2;
plan->solve(make_span(x.data(), n/2+1), make_span(y2.data(), n));

FIR filter:

const auto h = fir1(100, 0.1, FilterType::Low);
auto flt = FftFilter(h);
arr_real x = randn(10000);
arr_real y = flt(x);

Hilbert filter:

auto flt = HilbertFilter();
arr_real x = randn(10000);
arr_cmplx y1 = flt(x);
//or
arr_cmplx y2 = hilbert(x);

Add White Gaussian Noise:

arr_real x = randn(10000);
arr_real y = awgn(x, 10);   //snr=10dB

Cross-correlation:

arr_real x1 = randn(500);
arr_real x2 = awgn(x1, 10);
arr_real y = xcorr(x1, x2);

Delay estimation:

arr_real x1 = randn(500);
arr_real x2 = delayseq(x1, 100);
auto d1 = finddelay(x1, x2);    ///d1=100
auto [d2, _] = gccphat(x2, x1);   ///d2=100.0

Spectrum Analyze:

const int n = 1024;
arr_real x = randn(n);
x *= window::hann(n);
arr_cmplx y = fft(x) / (n/2);
y = y.slice(0, n/2+1);
auto z = pow2db(abs2(y));   //20*log10(..)

FIR filter design:

auto x = randn(10000);
auto h = fir1(99, 0.1, 0.2, FilterType::Bandstop);
auto flt = FirFilter(h);
auto y = flt(x);

Adaptive filters:

//simulate room impulse response
int M = 50;
auto rir = fir1(M-1, 0.1);
auto flt = FirFilter(rir);

int L = 100000;
auto x = randn(L);      //ref signal
auto n = 0.01 * randn(L);
auto d = flt(x) + n;    //mic signal

auto rls = RlsFilterR(M, 0.98, 1000);
auto [y, e] = rls(x, d);

ASSERT_LE(nmse(flt.coeffs(), rir), 0.01);

Resampling:

To process signal in batches (for example, in real time), use modules FIRDecimator, FIRInterpolator, FIRRateConverter or FIRResampler. The output signal will be delayed, but there will be no gap between frames.

To process the entire signal (for example, from a file), use function resample(x, p, q). The processing result will be aligned.

//44100 -> 16000
auto rsmp = dsplib::FIRResampler(16000, 44100);
auto out = rsmp.process(in);
//or
auto rsmp = dsplib::FIRRateConverter(160, 441);
auto out = rsmp.process(in);
//or
auto out = dsplib::resample(in, 16000, 44100);
//or
auto out = dsplib::resample(in, 160, 441);

//32000 -> 16000
auto decim = dsplib::FIRDecimator(2);
auto out = decim.process(in);
//or
auto out = dsplib::resample(in, 1, 2);
//or
auto out = dsplib::resample(in, 16000, 32000);

//16000 -> 32000
auto interp = dsplib::FIRInterpolator(2);
auto out = interp.process(in);
//or
auto out = dsplib::resample(in, 2, 1);
//or
auto out = dsplib::resample(in, 32000, 16000);

⚠️ Thread Safety & Memory Notice

The standard implementation is thread-safe because all caches (primarily FFT-related) use thread_local storage.
Memory warning: This may increase memory consumption if used carelessly – please avoid spreading processing across hundreds of threads.

The FFTW3 backend is wrapped with a static mutex (excluding fftw_execute calls) and is also thread-safe.

Documentation

Full API documentation is available here:

📖 https://vitalsong.github.io/dsplib/

The documentation is generated using Doxygen and reflects the current public API.

Build

Requires:

CMake (>=3.15)
C++17 compiler (exceptions can be disabled)

dsplib is designed with portability in mind and avoids platform-specific dependencies wherever possible.

If your platform has a reasonably complete C++17 compiler and CMake support, there is a good chance dsplib will build there as well.

In practice, it is often easier to list platforms where dsplib does not build. So far, I haven't found many :)

Known to work in production on:

Android (API 27, 29, ARMv7 / ARMv8)
Linux (GCC, Clang, MinGW)
Windows (MSVC, MinGW)
macOS (Clang)
WebAssembly (Emscripten)
Custom Buildroot-based ARM toolchains

Build and install:

# set DSPLIB_USE_FLOAT32=ON to enable float base type (double by default)
# set DSPLIB_NO_EXCEPTIONS=ON to disable exceptions
# set BUILD_SHARED_LIBS=ON to build shared lib
cmake . -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build --target=install

Use CPM manager:

CPMAddPackage(NAME dsplib
    GIT_REPOSITORY 
        "https://github.com/vitalsong/dsplib.git"
    VERSION 
        1.0.0
    OPTIONS
        "DSPLIB_USE_FLOAT32 OFF"
        "DSPLIB_NO_EXCEPTIONS OFF"
    EXCLUDE_FROM_ALL ON
)
target_link_libraries(${PROJECT_NAME} dsplib)

Performance

To build and run benchmarks:

cmake . -B build -DCMAKE_BUILD_TYPE=Release -DDSPLIB_BUILD_BENCHS=ON
cmake --build build
./build/benchs/dsplib-benchs

FFT

The implementation of non-power-of-two FFT is based on the general factorization algorithm. It is usually slower, but not critical.

For prime and semi-prime n

Dsplib

Install / Use

README