KSig

GPU-accelerated computation of popular time series kernels

A scikit-learn compatible Python package, which provides a GPU-accelerated implementation for most powerful and popular time series kernels and features using CuPy.

GPU Acceleration: All kernel computations are performed on the GPU via CuPy, providing major speedups on modern NVIDIA hardware.

The time series kernels included in this package are:

Signature Kernel using dynamic programming, which computes truncated signature kernels exactly, see Algorithms 3 and 6;
Signature-PDE Kernel, which approximates the untruncated signature kernel by solving a PDE;
Global Alignment Kernel, that computes a similarity score as a sum over all pairwise alignments between sequences, see equation 1.

Available time series features are:

Vanilla Path Signatures computing iterated integrals of a sequence lifted to a path by piecewise linear interpolation;
Low-Rank Signature Features computes signature features using a low-rank algorithm which iteratively approximates outer products, see Algorithm 5;
Random Fourier Signature Features using Random Fourier Features and random projection approaches for tensors, see Algorithms 2 and 3;
Random Warping Series that computes Dynamic Time Warping alignments between input sequences and random time series, see Algorithm 1.

Introduction

The signature kernel is one of the most powerful similarity measures for sequences, which lifts a kernel on a given domain to a kernel for sequences in that domain with strong theoretical guarantees:

It is a universal nonlinearity for time series, which means that it is flexible enough to approximate any continuous function on compact sets of sequences.
Invariant to a natural transformation of time series called reparametrization (in the discrete setting often called time warping), but can be made sensitive to it by including time parametrization as an additional channel.

The signature kernel between sequences can be computed using an instantiation of the ksig.kernels.SignatureKernel (also, see ksig.kernels.SignaturePDEKernel) class by lifting a static kernel (i.e. in this case an RBF kernel for vector-valued data from the ksig.static.kernels.RBFKernel class) to a kernel for sequences as:

import numpy as np
import ksig

# Number of signature levels to use.
n_levels = 5 

# Use the RBF kernel for vector-valued data as static (base) kernel.
static_kernel = ksig.static.kernels.RBFKernel() 

# Instantiate the signature kernel, which takes as input the static kernel.
sig_kernel = ksig.kernels.SignatureKernel(n_levels, static_kernel=static_kernel)

# Generate 10 sequences of length 50 with 5 channels.
n_seq, l_seq, n_feat = 10, 50, 5 
X = np.random.randn(n_seq, l_seq, n_feat)

# Sequence kernels take as input an array of sequences of ndim == 3,
# and work as a callable for computing the kernel matrix. 
K_XX = sig_kernel(X)  # K_XX has shape (10, 10).

# The diagonal kernel entries can also be computed.
K_X = sig_kernel(X, diag=True)  # K_X has shape (10,).

# Generate another array of 8 sequences of length 20 and 5 features.
n_seq2, l_seq2 = 8, 20
Y = np.random.randn(n_seq2, l_seq2, n_feat)

# Compute the kernel matrix between arrays X and Y.
K_XY = sig_kernel(X, Y)  # K_XY has shape (10, 8)

Installation

We recommend setting up a fresh conda environment with Python 3.9/10 using

conda create -n ksig python=3.10 
conda activate ksig

In order to build cupy when installing the package, the CUDA_PATH environment variable must be properly set. The default CUDA installation path on Linux is /usr/local/cuda-${CUDA-VERSION}/, while on Windows it is C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v${CUDA_VERSION}\ where ${CUDA_VERSION} is for example 12.3. This can be done on Linux by (substitute for ${CUDA_VERSION}):

export CUDA_PATH=/usr/local/cuda-${CUDA-VERSION}/

Once this is done, the package can be installed by

pip install git+https://github.com/tgcsaba/ksig.git

Alternatively, if you are still unable to build cupy, you may be able to download a precompiled wheel for certain CUDA versions from here. Once this is done, you may try installing ksig package again using the previous line.

Finally, to make use of the functionality of the provided models under ksig.models, i.e. PrecomputedKernelSVC and PrecomputedFeatureLinSVC, you also need to install the cuml library. This can only be manually installed from RAPIDS repository. Here, scroll down to the section titled "Install RAPIDS" and select the pip install method, your CUDA version, your Python version, then select "Choose Specific Packages" and unselect every package besides cuml, and copy-paste the given command into the terminal where the conda environment is activated.

More Examples

Scaling to large datasets

A computational bottleneck associated with the full-rank signature kernel is a joint quadratic complexity in the number of training examples (N) and the length of the sequences (L), i.e. O(N2·L2). Feature-based formulations of the signature kernel get around this issue by being able to represent the kernel using a finite-dimensional feature set.

One such example is Random Fourier Signature Features, which provides an unbiased approximation to the signature kernel, that converges in probability (1/M)-subexponentilly fast, where $M \in \mathbb{Z}_+$ is the truncation level.

This can be constructed using the ksig.kernels.SignatureFeatures class with two ingredients:

setting the static_features argument to an instance of the ksig.static.features.RandomFourierFeatures class,
and the projection argument to an instance of ksig.projections.TensorizedRandomProjection (for RFSF-TRP) or to ksig.projections.DiagonalProjection (for RFSF-DP).

import numpy as np
import ksig

# Number of signature levels to use.
n_levels = 5 

# Use 100 components in RFF and projection.
n_components = 100

# Instantiate RFF feature map.
static_feat = ksig.static.features.RandomFourierFeatures(n_components=n_components)
# Instantiate tensor random projections.
proj = ksig.projections.TensorizedRandomProjection(n_components=n_components)

# The RFSF-TRP feature map and kernel. Additionally to working as a callable for
# computing a kernel, it implements a fit and a transform method.
rfsf_trp_kernel = ksig.kernels.SignatureFeatures(
    n_levels=n_levels, static_features=static_feat, projection=proj)

# Generate 1000 sequences of length 200 with 100 features.
n_seq, l_seq, n_feat = 1000, 200, 100
X = np.random.randn(n_seq, l_seq, n_feat)

# Fit the kernel to the data.
rfsf_trp_kernel.fit(X)

# Compute the kernel matrix as before.
K_XX = rfsf_trp_kernel(X)  # K_XX has shape (1000, 1000).

# Generate another array of 800 sequences of length 250 and 100 features.
n_seq2, l_seq2 = 800, 250
Y = np.random.randn(n_seq2, l_seq2, n_feat)

# Compute the kernel matrix between X and Y.
# The kernel does not have to be fitted a second time.
K_XY = rfsf_trp_kernel(X, Y)  # K_XY has shape (1000, 800)

# Alternatively, we may compute features separately for X and Y. Under the hood,
# this is what the call method does, i.e. compute features and take their inner product.
P_X = rfsf_trp_kernel.transform(X)  # P_X has shape (1000, 501)
P_Y = rfsf_trp_kernel.transform(Y)  # P_Y has shape (800, 501)

# Check that the results match.
print(np.linalg.norm(K_XX - P_X @ P_X.T))
print(np.linalg.norm(K_XY - P_X @ P_Y.T))

Results

The experiments can be run using the code provided in the experiments directory. The results displayed here are the ones appearing on here. We have done our best efforts to aid in reproducibility by using run dependent seeds, however, we have also observed minor differences across CUDA and cupy versions in the produced results.

Results on Multivariate UEA Datasets with ($N \leq 1000$): | | RFSF-DP | RFSF-TRP | KSig | KSigPDE | RWS | GAK | RBF | RFF | | --- | --- | --- | --- | --- | --- | --- | --- | --- | | ArticularyWordRecognition | 0.984 | 0.981 | 0.990 | 0.983 | 0.987 | 0.977 | 0.977 | 0.978 | | AtrialFibrillation | 0.373 | 0.320 | 0.400 | 0.333 | 0.427 | 0.333 | 0.267 | 0.373 | | BasicMotions | 1.000 | 1.000 | 1.000 | 1.000 | 0.995 | 1.000 | 0.975 | 0.860 | | Cricket | 0.964 | 0.964 | 0.958 | 0.972 | 0.978 | 0.944 | 0.917 | 0.886 | | DuckDuckGeese | 0.636 | 0.664 | 0.700 | 0.480 | 0.492 | 0.500 | 0.420 | 0.372 | | ERing | 0.921 | 0.936 | 0.841 | 0.941 | 0.945 | 0.926 | 0.937 | 0.915 | | EigenWorms | 0.817 | 0.837 | 0.809 | 0.794 | 0.623 | 0.511 | 0.496 | 0.443 | | Epilepsy | 0.949 | 0.942 | 0.949 | 0.891 | 0.925 | 0.870 | 0.891 | 0.777 | | EthanolConcentration | 0.457 | 0.439 | 0.479 | 0.460 | 0.284 | 0.361 | 0.346 | 0.325 | | FingerMovements | 0.608 | 0.624 | 0.640 | 0.630 | 0.612 | 0.500 | 0.620 | 0.570 | | HandMovementDirection | 0.573 | 0.568 | 0.595 | 0.527 | 0.403 | 0.595 | 0.541 | 0.454 | | Handwriting | 0.434 | 0.400 | 0.479 | 0.409 | 0.591 | 0.481 | 0.307 | 0.249 | | Heartbea

KSig

Install / Use

README

<p align='center'> K<sub>Sig</sub> </p>

<p align='center'> GPU-accelerated computation of popular time series kernels </p>

Introduction

Installation

More Examples

Scaling to large datasets

Results

Related Skills