SkillAgentSearch skills...

ImputeGAP

ImputeGAP is a comprehensive Python library for imputation of missing values in time series data. It implements user-friendly APIs to easily visualize, analyze, and repair incomplete time series datasets.

Install / Use

/learn @eXascaleInfolab/ImputeGAP
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

<img align="right" width="140" height="140" src="https://www.naterscreations.com/imputegap/logo_imputegab.png" > <br /> <br />

Welcome to ImputeGAP

ImputeGAP is a comprehensive Python library for imputation of missing values in time series data. It implements user-friendly APIs to easily visualize, analyze, and repair incomplete time series datasets. The library supports a diverse range of imputation algorithms and modular missing data simulation catering to datasets with varying characteristics. ImputeGAP includes extensive customization options, such as automated hyperparameter tuning, benchmarking, explainability, and downstream evaluation.

In detail, the package provides:

  • Over 40 state-of-the-art time series imputation algorithms from six different families (Algorithms)
  • Several imputation univariate time series datasets and utilities to handle multivariate ones (Datasets)
  • Configurable contamination module that simulates real-world missingness patterns (Patterns)
  • AutoML techniques to parameterize the imputation algorithms (AutoML)
  • Unified benchmarking pipeline to evaluate the performance of imputation algorithms (Benchmark)
  • Modular analysis tools to assess the impact of imputation on time series downstream tasks (Downstream)
  • Expainability module to understand the impact of time series features on the imputation results (Explainer)
  • Adjustable wrappers to integrate new algorithms in different languages: Python, C++, Matlab, Java, and R (Contributing)
<br>

Python Release Coverage Platform Docs

<i>If you like our library, please add a ⭐ in our GitHub repository.</i>

<br>

| Tools | URL | |----------------------|------------------------------------------------------------------------------------------| | 📚 Documentation | https://imputegap.readthedocs.io/ | | 📦 PyPI | https://pypi.org/project/imputegap/ | | 📁 Datasets | Description |


Available Imputation Algorithms

| Algorithm | Family | Venue -- Year | |---------------------------|-------------------|------------------------------| | NuwaTS [35] | LLMs | Arxiv -- 2024 | | GPT4TS [36] | LLMs | NeurIPS -- 2023 | | 🚧 MOMENT [39] | LLMs | ICLR -- 2025 | | MissNet [27] | Deep Learning | KDD -- 2024 | | MPIN [25] | Deep Learning | PVLDB -- 2024 | | BayOTIDE [30] | Deep Learning | PMLR -- 2024 | | BitGraph [32] | Deep Learning | ICLR -- 2024 | | TimesNet [37] | Deep Learning | ICLR -- 2023 | | SAITS [41] | Deep Learning | ESWA -- 2023 | | PRISTI [26] | Deep Learning | ICDE -- 2023 | | GRIN [29] | Deep Learning | ICLR -- 2022 | | CSDI [38] | Deep Learning | NeurIPS -- 2021 | | HKMFT [31] | Deep Learning | TKDE -- 2021 | | DeepMVI [24] | Deep Learning | PVLDB -- 2021 | | MRNN [22] | Deep Learning | IEEE Trans on BE -- 2019 | | BRITS [23] | Deep Learning | NeurIPS -- 2018 | | GAIN [28] | Deep Learning | ICML -- 2018 | | 🚧 SSGAN [40] | Deep Learning | AAAI -- 2021 | | 🚧 GP-VAE [42] | Deep Learning | AISTATS -- 2020 | | 🚧 NAOMI [43] | Deep Learning | NeurIPS -- 2019 | | CDRec [1] | Matrix Completion | KAIS -- 2020 | | TRMF [8] | Matrix Completion | NeurIPS -- 2016 | | GROUSE [3] | Matrix Completion | PMLR -- 2016 | | ROSL [4] | Matrix Completion | CVPR -- 2014 | | SoftImpute [6] | Matrix Completion | JMLR -- 2010 | | SVT [7] | Matrix Completion | SIAM J. OPTIM -- 2010 | | SPIRIT [5] | Matrix Completion | VLDB -- 2005 | | IterativeSVD [2] | Matrix Completion | BIOINFORMATICS -- 2001 | | TKCM [11] | Pattern Search | EDBT -- 2017 | | STMVL [9] | Pattern Search | IJCAI -- 2016 | | DynaMMo [10] | Pattern Search | KDD -- 2009 | | IIM [12] | Machine Learning | ICDE -- 2019 | | XGBOOST [13] | Machine Learning | KDD -- 2016 | | MICE [14] | Machine Learning | Statistical Software -- 2011 | | MissForest [15] | Machine Learning | BioInformatics -- 2011 | | KNNImpute | Statistics | - | | Interpolation | Statistics | - | | MinImpute | Statistics | - | | ZeroImpute | Statistics | - | | MeanImpute | Statistics | - | | MeanImputeBySeries | Statistics | - |

Quick Navigation


<br> <br>

Getting Started

System Requirements

ImputeGAP is compatible with Python>=3.11 and Unix-compatible environment.

<i>To create and set up an environment with Python 3.12, please refer to the installation guide.</i>

<br>

Installation

pip

To install/update the latest version of ImputeGAP, run the following command:

pip install imputegap
<br>

Source

If you would like to extend the library, you can install from source:

git init
git clone https://github.com/eXascaleInfolab/ImputeGAP
cd ./ImputeGAP
pip install -e .
<br>

Docker

Alternatively, you can download the latest version of ImputeGAP with all dependencies pre-installed using Docker.

Launch Docker and make sure it is running:

docker version

Pull the ImputeGAP Docker image (add --platform linux/x86_64 in the command for MacOS) :

docker pull qnater/imputegap:1.1.21

Run the Docker container:

docker run -p 8888:8888 qnater/imputegap:1.1.21

<br> <br>

Tutorials

Dataset Loading

ImputeGAP comes with several time series datasets. The list of datasets is described here.

As an example, we use the eeg-alcohol dataset, composed of individuals with a genetic predisposition to alcoholism. The dataset contains measurements from 64 electrodes placed on subject’s scalps, sampled at 256 Hz. The dimensions of the dataset are 64 series, each containing 256 values.

Example Loading

You can find this example of normalization in the file runner_loading.py.

To load and plot the eeg-alcohol dataset from the library:

from imputegap.recovery.manager import TimeSeries
from imputegap.tools import utils

# initialize the time series object
ts = TimeSeries()

# load and normalize the dataset from the library
ts.load_series(utils.search_path("eeg-alcohol"), normalizer="z_score")

# print and plot a subset of time series
ts.print(nbr_series=4, nbr_val=4)
ts.plot(input_data=ts.data, nbr_series=6, nbr_val=100, save_path="./imputegap_assets")

The module ts.datasets contains all the publicly available datasets provided by the library, which can be listed as follows:

from imputegap.recovery.manager import TimeSeries
ts = TimeSeries()
print(f"ImputeGAP datasets : {ts.datasets}")

Contamination

We now describe how to simulate missing values in the loaded dataset. ImputeGAP implements eight different missingness patterns. For more details about the patterns, please refer

View on GitHub
GitHub Stars63
CategoryDevelopment
Updated21d ago
Forks13

Languages

Jupyter Notebook

Security Score

100/100

Audited on Mar 11, 2026

No findings