Couta
a time series anomaly detection method based on the calibrated one-class classifier
Install / Use
/learn @xuhongzuo/CoutaREADME
COUTA - time series anomaly detection
Implementation of "Calibrated One-class classification-based Unsupervised Time series Anomaly
detection" (COUTA for short).
The full paper is available at link.
Please consider citing our paper if you use this repository. :wink:
@article{xu2024calibrated,
title={Calibrated one-class classification for unsupervised time series anomaly detection},
author={Xu, Hongzuo and Wang, Yijie and Jian, Songlei and Liao, Qing and Wang, Yongjun and Pang, Guansong},
journal={IEEE Transactions on Knowledge and Data Engineering},
volume={36},
number={11},
pages={5723--5736},
year={2024},
publisher={IEEE}
}
Environment
main packages
torch==1.10.1+cu113
numpy==1.20.3
pandas==1.3.3
scipy==1.4.1
scikit-learn==1.1.1
we provide a requirements.txt in our repository.
Takeaways
APIs
COUTA provides easy APIs in a sklearn/pyod style, that is, we can first instantiate the model class by giving the parameters
from src.algorithms.couta_algo import COUTA
model_configs = {'sequence_length': 50, 'stride': 1}
model = COUTA(**model_configs)
then, the instantiated model can be used to fit and predict data, please use dataframes of pandas as input data
model.fit(train_df)
score_dic = model.predict(test_df)
score = score_dic['score_t']
We use a dictionary as our prediction output for the sake of consistency with an evaluation work of time series anomaly detection link
score_t is a vector that indicates anomaly scores of each time observation in the testing dataframe, and a higher value represents a higher likehood to be an anomaly
model save and load
Training by feeding the save_model_path parameter, the model will be saved in this path
from src.algorithms.couta_algo import COUTA
path = 'saved_models/couta.pth'
model_configs = {'sequence_length': 50, 'stride': 1, 'save_model_path': path}
model = COUTA(**model_configs)
model.fit(train_df)
Then, couta can be used without fitting.
from src.algorithms.couta_algo import COUTA
path = 'saved_models/couta.pth'
model_configs = {'load_model_path': path}
model = COUTA(**model_configs)
model.predict(test_df)
Datasets used in our paper
- Due to the license issue of these datasets, we provide download links here. We also offer the preprocessing script in
data_preprocessing.ipynb. You can easily generate processed datasets that can be directly fed into our pipeline by downloading original data and running this notebook. *
The used datasets can be downloaded from:
- ASD https://github.com/zhhlee/InterFusion
- SMD https://github.com/NetManAIOps/OmniAnomaly
- SWAT https://itrust.sutd.edu.sg/itrust-labs_datasets
- WaQ https://www.spotseven.de/gecco/gecco-challenge
- DSADS https://github.com/zhangyuxin621/AMSL
- Epilepsy https://github.com/boschresearch/NeuTraL-AD/
Reproduction of experiment results
Experiments of the effectivness (4.2)
After handling the used datasets, you can use main.py to perform COUTA on different time series datasets, we use six datasets in our paper, and --data can be chosen from [ASD, SMD, SWaT, WaQ, Epilepsy, DSADS].
For example, perform COUTA on the ASD dataset by
python main.py --data ASD --algo COUTA
or you can directly use script_effectivenss.sh
Generalization test (4.3)
we include the used synthetic datasets in data_processed/
python main_showcase.py --type point
python main_showcase.py --type pattern
two anomaly score npy files are generated, you can use experiment_generalization_ability.ipynb to visualize the data and our results.
Robustness (4.4)
use src/experiments/data_contaminated_generator_dsads.py and src/experiments/data_contaminated_generator_ep.py to generate datasets with various contamination ratios
use main.py to perform COUTA on these datasets, or directly execute script_robustness.sh
Ablation study (4.5)
change the --algo argument to COUTA_wto_umc, COUTA_wto_nac, or Canonical, e.g.,
python main.py --algo COUTA_wto_umc --data ASD
use script_effectiveness.sh also produce detection results of ablated variants
Others
As for the sensitivity test (4.6), please adjust the parameters in the yaml file.
As for the scalability test (4.7), the produced result files also contain execution time.
Competing methods
All of the anomaly detectors in our paper are implemented in Python. We list their publicly available implementations below.
OCSVMandECOD: we directly use pyod (python library of anomaly detection approaches);GOAD: https://github.com/lironber/GOADDSVDD: https://github.com/lukasruff/Deep-SVDD-PyTorchUSAD: https://github.com/hoo2257/USAD-Anomaly-Detecting-AlgorithmGDN: https://github.com/d-ailin/GDNNeuTraL: https://github.com/boschresearch/NeuTraL-ADTranAD: https://github.com/imperial-qore/TranADLSTM-ED,Tcn-ED,MSCREDandOmni: https://github.com/astha-chem/mvts-ano-eval/
