<h1 align="center"> <img src="https://raw.githubusercontent.com/eonu/sequentia/master/docs/source/_static/images/logo.png" width="75px"> Sequentia </h1> Scikit-Learn compatible HMM and DTW based sequence machine learning algorithms in Python. <div align="center"> <a href="https://pypi.org/project/sequentia"> <img src="https://img.shields.io/pypi/v/sequentia?logo=pypi&style=flat-square" alt="PyPI"/> </a> <a href="https://pypi.org/project/sequentia"> <img src="https://img.shields.io/pypi/pyversions/sequentia?logo=python&style=flat-square" alt="PyPI - Python Version"/> </a> <a href="https://sequentia.readthedocs.io/en/latest"> <img src="https://img.shields.io/readthedocs/sequentia.svg?logo=read-the-docs&style=flat-square" alt="Read The Docs - Documentation"> </a> <a href="https://coveralls.io/github/eonu/sequentia"> <img src="https://img.shields.io/coverallsCoverage/github/eonu/sequentia?logo=coveralls&style=flat-square" alt="Coveralls - Coverage"/> </a> <a href="https://raw.githubusercontent.com/eonu/sequentia/master/LICENSE"> <img src="https://img.shields.io/pypi/l/sequentia?style=flat-square" alt="PyPI - License"/> </a> </div> <a href="#about">About</a> · <a href="#build-status">Build Status</a> · <a href="#features">Features</a> · <a href="#installation">Installation</a> · <a href="#documentation">Documentation</a> · <a href="#examples">Examples</a> · <a href="#acknowledgments">Acknowledgments</a> · <a href="#references">References</a> · <a href="#contributors">Contributors</a> · <a href="#licensing">Licensing</a>

About

Sequentia is a Python package that provides various classification and regression algorithms for sequential data, including methods based on hidden Markov models and dynamic time warping.

Some examples of how Sequentia can be used on sequence data include:

determining a spoken word based on its audio signal or alternative representations such as MFCCs,
predicting motion intent for gesture control from sEMG signals,
classifying hand-written characters according to their pen-tip trajectories.

Why Sequentia?

Simplicity and interpretability: Sequentia offers a limited set of machine learning algorithms, chosen specifically to be more interpretable and easier to configure than more complex alternatives such as recurrent neural networks and transformers, while maintaining a high level of effectiveness.
Familiar and user-friendly: To fit more seamlessly into the workflow of data science practitioners, Sequentia follows the ubiquitous Scikit-Learn API, providing a familiar model development process for many, as well as enabling wider access to the rapidly growing Scikit-Learn ecosystem.
Speed: Some algorithms offered by Sequentia naturally have restrictive runtime scaling, such as k-nearest neighbors. However, our implementation is optimized to the point of being multiple orders of magnitude faster than similar packages — see the Benchmarks section for more information.

Build Status

| master | dev | | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | | |

Features

Models

Dynamic Time Warping + k-Nearest Neighbors (via `dtaidistance`)

Dynamic Time Warping (DTW) is a distance measure that can be applied to two sequences of different length. When used as a distance measure for the k-Nearest Neighbors (kNN) algorithm this results in a simple yet effective inference algorithm.

[x] Classification
[x] Regression
[x] Variable length sequences
[x] Multivariate real-valued observations
[x] Sakoe–Chiba band global warping constraint
[x] Dependent and independent feature warping (DTWD/DTWI)
[x] Custom distance-weighted predictions
[x] Multi-processed prediction

Hidden Markov Models (via `hmmlearn`)

A Hidden Markov Model (HMM) is a state-based statistical model which represents a sequence as a series of observations that are emitted from a collection of latent hidden states which form an underlying Markov chain. Each hidden state has an emission distribution that models its observations.

Expectation-maximization via the Baum-Welch algorithm (or forward-backward algorithm) [1] is used to derive a maximum likelihood estimate of the Markov chain probabilities and emission distribution parameters based on the provided training sequence data.

[x] Classification
[x] Variable length sequences
[x] Multivariate real-valued observations (modeled with Gaussian mixture emissions)
[x] Univariate categorical observations (modeled with discrete emissions)
[x] Linear, left-right and ergodic topologies
[x] Multi-processed training and prediction

Scikit-Learn compatibility

Sequentia (≥2.0) is compatible with the Scikit-Learn API (≥1.4), enabling for rapid development and prototyping of sequential models.

The integration relies on the use of metadata routing, which means that in most cases, the only necessary change is to add a lengths key-word argument to provide sequence length information, e.g. fit(X, y, lengths=lengths) instead of fit(X, y).

Similar libraries

As DTW k-nearest neighbors is the core algorithm offered by Sequentia, below is a comparison of the DTW k-nearest neighbors algorithm features supported by Sequentia and similar libraries.

||sequentia|aeon|tslearn|sktime|pyts| |-|:-:|:-:|:-:|:-:|:-:| |Scikit-Learn compatible|✅|✅|✅|✅|✅| |Multivariate sequences|✅|✅|✅|✅|❌| |Variable length sequences|✅|✅|➖1|❌2|❌3| |No padding required|✅|❌|➖1|❌2|❌3| |Classification|✅|✅|✅|✅|✅| |Regression|✅|✅|✅|✅|❌| |Preprocessing|✅|✅|✅|✅|✅| |Multiprocessing|✅|✅|✅|✅|✅| |Custom weighting|✅|✅|✅|✅|✅| |Sakoe-Chiba band constraint|✅|✅|✅|✅|✅| |Itakura paralellogram constraint|❌|✅|✅|✅|✅| |Dependent DTW (DTWD)|✅|✅|✅|✅|❌| |Independent DTW (DTWI)|✅|❌|❌|❌|✅| |Custom DTW measures|❌4|✅|❌|✅|✅|

1tslearn supports variable length sequences with padding, but doesn't seem to mask the padding.
2sktime does not support variable length sequences, so they are padded (and padding is not masked).
3pyts does not support variable length sequences, so they are padded (and padding is not masked).
4sequentia only supports dtaidistance, which is one of the fastest DTW libraries as it is written in C.

Benchmarks

To compare the above libraries in runtime performance on dynamic time warping k-nearest neighbors classification tasks, a simple benchmark was performed on a univariate sequence dataset.

The Free Spoken Digit Dataset was used for benchmarking and consists of:

3000 recordings of 10 spoken digits (0-9)
- 50 recordings of each digit for each of 6 speakers
- 1500 used for training, 1500 used for testing (split via label stratification)
13 features (MFCCs)
- Only the first feature was used as not all of the above libraries support multivariate sequences
Sequence length statistics: (min 6, median 17, max 92)

Each result measures the total time taken to complete training and prediction repeated 10 times.

All of the above libraries support multiprocessing, and prediction was performed using 16 workers.

*: sktime, tslearn and pyts seem to not mask padding, which may result in incorrect predictions.

Device information:

Product: Lenovo ThinkPad T14s (Gen 6)

Processor: AMD Ryzen™ AI 7 PRO 360 (8 cores, 16 threads, 2-5GHz)

Memory: 64 GB LPDDR5X-7500MHz

Solid State Drive: 1 TB SSD M.2 2280 PCIe Gen4 Performance TLC Opal

Operating system: Fedora Linux 41 (Workstation Edition)

Installation

The latest stable version of Sequentia can be installed with the following command:

pip install sequentia

C libraries

For optimal performance when using any of the k-NN based models, it is important that the correct dtaidistance C libraries are ac

Sequentia

Install / Use

README

About

Why Sequentia?

Build Status

Features

Models

Dynamic Time Warping + k-Nearest Neighbors (via `dtaidistance`)

Hidden Markov Models (via `hmmlearn`)

Scikit-Learn compatibility

Similar libraries

Benchmarks

Installation

C libraries

Sequentia

Install / Use

README

About

Why Sequentia?

Build Status

Features

Models

Dynamic Time Warping + k-Nearest Neighbors (via dtaidistance)

Hidden Markov Models (via hmmlearn)

Scikit-Learn compatibility

Similar libraries

Benchmarks

Installation

C libraries

Dynamic Time Warping + k-Nearest Neighbors (via `dtaidistance`)

Hidden Markov Models (via `hmmlearn`)