ImputeGAP
ImputeGAP is a comprehensive Python library for imputation of missing values in time series data. It implements user-friendly APIs to easily visualize, analyze, and repair incomplete time series datasets.
Install / Use
/learn @eXascaleInfolab/ImputeGAPREADME
Welcome to ImputeGAP
ImputeGAP is a comprehensive Python library for imputation of missing values in time series data. It implements user-friendly APIs to easily visualize, analyze, and repair incomplete time series datasets. The library supports a diverse range of imputation algorithms and modular missing data simulation catering to datasets with varying characteristics. ImputeGAP includes extensive customization options, such as automated hyperparameter tuning, benchmarking, explainability, and downstream evaluation.
In detail, the package provides:
- Over 40 state-of-the-art time series imputation algorithms from six different families (Algorithms)
- Several imputation univariate time series datasets and utilities to handle multivariate ones (Datasets)
- Configurable contamination module that simulates real-world missingness patterns (Patterns)
- AutoML techniques to parameterize the imputation algorithms (AutoML)
- Unified benchmarking pipeline to evaluate the performance of imputation algorithms (Benchmark)
- Modular analysis tools to assess the impact of imputation on time series downstream tasks (Downstream)
- Expainability module to understand the impact of time series features on the imputation results (Explainer)
- Adjustable wrappers to integrate new algorithms in different languages: Python, C++, Matlab, Java, and R (Contributing)
<i>If you like our library, please add a ⭐ in our GitHub repository.</i>
<br>| Tools | URL | |----------------------|------------------------------------------------------------------------------------------| | 📚 Documentation | https://imputegap.readthedocs.io/ | | 📦 PyPI | https://pypi.org/project/imputegap/ | | 📁 Datasets | Description |
Available Imputation Algorithms
| Algorithm | Family | Venue -- Year | |---------------------------|-------------------|------------------------------| | NuwaTS [35] | LLMs | Arxiv -- 2024 | | GPT4TS [36] | LLMs | NeurIPS -- 2023 | | 🚧 MOMENT [39] | LLMs | ICLR -- 2025 | | MissNet [27] | Deep Learning | KDD -- 2024 | | MPIN [25] | Deep Learning | PVLDB -- 2024 | | BayOTIDE [30] | Deep Learning | PMLR -- 2024 | | BitGraph [32] | Deep Learning | ICLR -- 2024 | | TimesNet [37] | Deep Learning | ICLR -- 2023 | | SAITS [41] | Deep Learning | ESWA -- 2023 | | PRISTI [26] | Deep Learning | ICDE -- 2023 | | GRIN [29] | Deep Learning | ICLR -- 2022 | | CSDI [38] | Deep Learning | NeurIPS -- 2021 | | HKMFT [31] | Deep Learning | TKDE -- 2021 | | DeepMVI [24] | Deep Learning | PVLDB -- 2021 | | MRNN [22] | Deep Learning | IEEE Trans on BE -- 2019 | | BRITS [23] | Deep Learning | NeurIPS -- 2018 | | GAIN [28] | Deep Learning | ICML -- 2018 | | 🚧 SSGAN [40] | Deep Learning | AAAI -- 2021 | | 🚧 GP-VAE [42] | Deep Learning | AISTATS -- 2020 | | 🚧 NAOMI [43] | Deep Learning | NeurIPS -- 2019 | | CDRec [1] | Matrix Completion | KAIS -- 2020 | | TRMF [8] | Matrix Completion | NeurIPS -- 2016 | | GROUSE [3] | Matrix Completion | PMLR -- 2016 | | ROSL [4] | Matrix Completion | CVPR -- 2014 | | SoftImpute [6] | Matrix Completion | JMLR -- 2010 | | SVT [7] | Matrix Completion | SIAM J. OPTIM -- 2010 | | SPIRIT [5] | Matrix Completion | VLDB -- 2005 | | IterativeSVD [2] | Matrix Completion | BIOINFORMATICS -- 2001 | | TKCM [11] | Pattern Search | EDBT -- 2017 | | STMVL [9] | Pattern Search | IJCAI -- 2016 | | DynaMMo [10] | Pattern Search | KDD -- 2009 | | IIM [12] | Machine Learning | ICDE -- 2019 | | XGBOOST [13] | Machine Learning | KDD -- 2016 | | MICE [14] | Machine Learning | Statistical Software -- 2011 | | MissForest [15] | Machine Learning | BioInformatics -- 2011 | | KNNImpute | Statistics | - | | Interpolation | Statistics | - | | MinImpute | Statistics | - | | ZeroImpute | Statistics | - | | MeanImpute | Statistics | - | | MeanImputeBySeries | Statistics | - |
Quick Navigation
-
Getting Started
-
Code Snippets
-
Contribute
-
Additional Information
<br> <br>
Getting Started
System Requirements
ImputeGAP is compatible with Python>=3.11 and Unix-compatible environment.
<i>To create and set up an environment with Python 3.12, please refer to the installation guide.</i>
<br>Installation
pip
To install/update the latest version of ImputeGAP, run the following command:
pip install imputegap
<br>
Source
If you would like to extend the library, you can install from source:
git init
git clone https://github.com/eXascaleInfolab/ImputeGAP
cd ./ImputeGAP
pip install -e .
<br>
Docker
Alternatively, you can download the latest version of ImputeGAP with all dependencies pre-installed using Docker.
Launch Docker and make sure it is running:
docker version
Pull the ImputeGAP Docker image (add --platform linux/x86_64 in the command for MacOS) :
docker pull qnater/imputegap:1.1.21
Run the Docker container:
docker run -p 8888:8888 qnater/imputegap:1.1.21
<br> <br>
Tutorials
Dataset Loading
ImputeGAP comes with several time series datasets. The list of datasets is described here.
As an example, we use the eeg-alcohol dataset, composed of individuals with a genetic predisposition to alcoholism. The dataset contains measurements from 64 electrodes placed on subject’s scalps, sampled at 256 Hz. The dimensions of the dataset are 64 series, each containing 256 values.
Example Loading
You can find this example of normalization in the file runner_loading.py.
To load and plot the eeg-alcohol dataset from the library:
from imputegap.recovery.manager import TimeSeries
from imputegap.tools import utils
# initialize the time series object
ts = TimeSeries()
# load and normalize the dataset from the library
ts.load_series(utils.search_path("eeg-alcohol"), normalizer="z_score")
# print and plot a subset of time series
ts.print(nbr_series=4, nbr_val=4)
ts.plot(input_data=ts.data, nbr_series=6, nbr_val=100, save_path="./imputegap_assets")
The module ts.datasets contains all the publicly available datasets provided by the library, which can be listed as follows:
from imputegap.recovery.manager import TimeSeries
ts = TimeSeries()
print(f"ImputeGAP datasets : {ts.datasets}")
Contamination
We now describe how to simulate missing values in the loaded dataset. ImputeGAP implements eight different missingness patterns. For more details about the patterns, please refer
