pyFAST: Flexible, Advanced Framework for Multi-source and Sparse Time Series Analysis in PyTorch

pyFAST (Forecasting And Sparse Time-Series) is a research-driven, modular Python framework built for advanced and efficient time series analysis, especially excelling in multi-source and sparse data scenarios. Leveraging PyTorch, pyFAST provides a unified and flexible platform for forecasting, imputation, and generative modeling, integrating cutting-edge LLM-inspired architectures, Variational Autoencoders, and classical time series models.

Update logs:

2025-12-05: Update the overview figure to reflect the latest module structure.
2025-10-20: All the models are categorized for better navigation and usability.
2025-09-15: SMTDataset and SSTDataset supports both CSV file(s) in directories and zipped files at the same time.
2025-08-26: Released the software as well as benchmarking results and datasets link.

Unlock the Power of pyFAST for:

Alignment-Free Multi-source Time Series Analysis: Process and fuse data from diverse sources without the need for strict temporal alignment, inspired by Large Language Model principles.
Native Sparse Time Series Forecasting: Effectively handle and forecast sparse time series data with specialized metrics and loss functions, addressing a critical gap in existing libraries.
Rapid Research Prototyping: Experiment and prototype novel time series models and techniques with unparalleled flexibility and modularity.
Seamless Customization and Extensibility: Tailor and extend the library to your specific research or application needs with its component-based modular design.
High Performance and Scalability: Benefit from optimized PyTorch implementations and multi-device acceleration for efficient handling of large datasets and complex models.

Key Capabilities:

Pioneering LLM-Inspired Models: First-of-its-kind adaptations of Large Language Models specifically for alignment-free multi-source time series forecasting.
Native Sparse Data Support: Comprehensive support for sparse time series, including specialized metrics, loss functions, and efficient data handling.
Flexible Multi-source Data Fusion: Integrate and analyze time series data from diverse, potentially misaligned sources.
Extensive Model Library: Includes a broad range of classical, deep learning (Transformers, RNNs, CNNs, GNNs), and generative time series models for both multivariate (MTS) and univariate (UTS) data.
Modular and Extensible Architecture: Component-based design enables easy customization, extension, and combination of modules.
Streamlined Training Pipeline: Trainer class simplifies model training with built-in validation, early stopping, checkpointing, and multi-device support.
Comprehensive Evaluation Suite: Includes a wide array of standard and sparse-specific evaluation metrics via the Evaluator class.
Built-in Generative Modeling: Dedicated module for time series Variational Autoencoders (VAEs), including Transformer-based VAEs.
Reproducibility Focus: Utilities like initial_seed() ensure experiment reproducibility.

Explore the Core Modules (See Figure Above):

As depicted in the Software Overview Diagram above (Figure 1), pyFAST's fast/ library is structured into five core modules, ensuring a cohesive and versatile framework:

data/ package: Handles data loading, preprocessing, and dataset creation for SST, SMT, MMT, and BDP data scenarios. Key features include efficient sparse data handling, multi-source data integration, scaling methods, patching, and data splitting utilities.
model/ package: Houses a diverse collection of time series models, categorized into uts/ (univariate), mts/ (multivariate), and base/ (building blocks) submodules. Includes classical models, deep learning architectures (CNNs, RNNs, Transformers, GNNs), fusion models, and generative models.
train.py Module: Provides the Trainer class to streamline the entire model training pipeline. Features include device management, model compilation, optimizer and scheduler management, training loop, validation, early stopping, checkpointing, and visualization integration.
metric/ package: Offers a comprehensive suite of evaluation metrics for time series tasks, managed by the Evaluator class. Includes standard metrics (MSE, MAE, etc.) and specialized sparse metrics for masked data.
generative/ package: (Optional, if you want to highlight) Focuses on generative time series modeling, providing implementations of Time series VAEs and Transformer-based VAEs.

Installation

Ensure you have Python installed. Then, to install pyFAST and its dependencies, run:

pip install -r requirements.txt

Getting Started

Basic Usage Example

Jumpstart your time series projects with pyFAST using this basic example:

import torch

from fast import initial_seed, initial_logger, get_device
from fast.data import SSTDataset
from fast.train import Trainer
from fast.metric import Evaluator
from fast.model.mts.ar import ANN  # Example: Using a simple ANN model

# Initialize components for reproducibility and evaluation
initial_seed(2025)

# Initialize logger for tracking training progress
logger = initial_logger()

# Prepare your time series data: replace with actual data loading.
ts = torch.sin(torch.arange(0, 100, 0.1)).unsqueeze(1)  # Shape: (1000, 1)
train_ds = SSTDataset(ts, input_window_size=10, output_window_size=1).split(0, 0.8, mark='train')
val_ds = SSTDataset(ts, input_window_size=10, output_window_size=1).split(0.8, 1., mark='val')

# Initialize the model (e.g., ANN)
model = ANN(
    input_window_size=train_ds.window_size,  # Adapt input window size from dataset
    output_window_size=train_ds.output_window_size,  # Adapt output window size from dataset, a.k.a. prediction steps
    hidden_sizes=32  # Hidden layer size
)

# Set up the Trainer for model training and evaluation
device = get_device('cpu')  # Use 'cuda', 'cpu', or 'mps'
evaluator = Evaluator(['MAE', 'RMSE'])  # Evaluation metrics
trainer = Trainer(device, model, evaluator=evaluator)

# Train model using prepared datasets
history = trainer.fit(train_ds, val_ds, epoch_range=(1, 10))  # Train for 10 epochs
logger.info(str(history))

# After training, evaluate on a test dataset (if available)
val_results = trainer.evaluate(val_ds)
logger.info(str(val_results))

Data Structures Overview

pyFAST is designed to handle various time series data structures:

Multiple Time Series (MTS):
- Shape: [batch_size, window_size, n_vars]
- For datasets with multiple variables recorded over time (e.g., sensor readings, stock prices of multiple companies).
Univariate Time Series (UTS):
- Shape: [batch_size * n_vars, window_size, 1]
- For datasets focusing on single-variable sequences, often processed in batches for efficiency.
Advanced Data Handling:
- Sparse Data Ready: Models and metrics are designed to effectively work with sparse time series data and missing values, utilizing masks for accurate computations.
- Exogenous Variable Integration: Seamlessly incorporate external factors (exogenous variables) to enrich your time series models.
- Variable-Length Sequence Support: Utilizes dynamic padding to efficiently process time series with varying lengths within batches, optimizing training and inference.

Supporting Models

pyFAST offers a wide range of time series models, categorized as follows:

Multivariate Time Series Forecasting:
- AR, GAR, VAR: Autoregressive models.
- ANN: Artificial Neural Networks.
- NLinear: Normalization-Linear models.
- DLinear: Decomposition-Linear models.
- RLinear: Revisiting Long-term Time Series Forecasting.
- STD: Seasonal-Trend Decomposition.
- TimeSeriesRNN, EncoderDecoder: RNN-based forecasting architectures, such as RNN, GRU, LSTM and miniLSTM.
- TemporalConvNet: Temporal Convolutional Network.
- CNN1D, CNNRNN, CNNRNNRes: Convolutional sequence models.
- LSTNet: LSTM + CNN hybrid forecasting model.
- TSMixer: Time Series Mixer.
- PatchMLP: Patch-based MLP forecaster.
- KAN: Kolmogorov-Arnold Networks.
- DeepResidualNetwork: Deep residual forecasting network.
- Amplifier: Feature amplification forecasting model.
- Transformer: Attention is All You Need.
- Informer: Efficient long-sequence forecasting.
- Autoformer: Decomposition Transformer.
- FEDformer: Frequency-enhanced Transformer.
- FiLM: Frequency improved Legendre Memory Model.
- Triformer: Tri-level Transformer.
- Crossformer: Cross-dimension attention.
- TimesNet: Multi-periodicity modeling.
- PatchTST: Patch-based Transformer.
- STAEformer: Spatio-Temporal Adaptive Embedding Transf

PyFAST

Install / Use

README