BOTL
Bi-directional Online Transfer Learning (BOTL) framework and data generators for concept drifting data streams.
Install / Use
/learn @hmckay/BOTLREADME
BOTL
This git repo has data, data generators and code for the Bi-directional Online Transfer Learning framework (BOTL).
Running code
To run code either use
python3 controller2.py ....python3 runResults.py ....(this runs controller2.py for multiple repeat iterations)
Use python3 runResults.py --help to display run options
Common Options
--domaintype of dataset used:Following, Heating, Sudden, Gradual--typeconcept drift detection strategy:RePro,ADWINandAWProhave been implemented--windowwindow size of the concept drift detector--ReProAcc,--ReProProband--ADWINDeltaparameters required by RePro, ADWIN and AWPro--runidused when debugging (change output incontroller2.pyto somethign other than dev/null)--numStreamsnumber of domains in the framework--ensembleversion of BOTL/how models are combined - see below for options--perfCullpredictive performance culling threhold parameter used by P-Thresh (BOTL-C.I), MI-Thresh (BOTL-C.II) and CS-Thresh--miCullMutual Information culling threshold parameter used by MI-Thresh (BOTL-C.II)--paCullPricipal Angle/conceptual similarity culling threshold parameter used by CS-Thresh--variancetotal variance captured by the PCs used to represent base models, used by CS-Thresh and CS-Clust--learnerlist of types of models to be used, so far SVRs and RRs can be used
BOTL variants
Different variants of BOTL have been implemented and are specified by the --ensemble parameter
- BOTL:
- P-Thresh:
- MI-Thresh:
- CS-Thresh:
--ensemble OLSFEPA- use
--paCullto set conceptual similarity culling threshold parameter - BOTL with conceptual similarity and predictive performance thresholding to select base models
- introduced in [3]
- CS-Clust:
Available data and data generators and BOTL implementations:
- Following distance data for 6 journeys (2 drivers).
- Drifting hyperplane data generator
- Smart home heating simulation (with real world weather data)
Note the underlying framework is the same for all three implementations. For ease of reproducibility all three versions have been added.
AWPro
AWPro is a concept drift detection algorithm that combines aspects of RePro [5] and ADWIN [6] that better suit the BOTL framework. AWPro was first introduced in [2].
Parameter Analysis
Parameter analysis has been done to consider (see parameterAnalysis.pdf) to impact of the parameter values of underlying concept drift detection strategies, and how they impact the BOTL framework.
Source Code
The BOTL framework has been created using various code from other sources. ADWIN and AWPro implementations (which uses ADWIN as a basis for drift detection) are based upon the implementation available: https://github.com/rsdevigo/pyAdwin. This code is included in datasetBOTL/BiDirTansfer/pyadwin/
Other work relating to future variations of BOTL use Self-Tuning Spectral Clustering has been created based on the implementation available: https://github.com/wOOL/STSC. This code is used in datasetBOTL/BiDirTransfer/Models/stsc*.py
References
<a id="1">[1]</a> McKay, H., Griffiths, N., Taylor, P., Damoulas, T. and Xu, Z., 2019. Online Transfer Learning for Concept Drifting Data Streams. In BigMine@ KDD.
<a id="2">[2]</a> McKay, H., Griffiths, N., Taylor, P., Damoulas, T. and Xu, Z., 2020. Bi-directional online transfer learning: a framework. Annals of Telecommunications, 75(9), pp.523-547.
<a id="3">[3]</a> McKay, H., Griffiths, N. and Taylor, P., 2021. Conceptually Diverse Base Model Selection for Meta-Learners in Concept Drifting Data Streams. arXiv preprint arXiv:2111.14520.
<a id="4">[4]</a> Zelnik, M.L. and Perona, P., 2015. Self-tuning spectral clustering. Advances in Neural Information Processing Systems, pp.1601-1608.
<a id="5">[5]</a> Yang, Y., Wu, X. and Zhu, X., 2005, August. Combining proactive and reactive predictions for data streams. In Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining (pp. 710-715).
<a id="6">[6]</a> Bifet, A. and Gavalda, R., 2007, April. Learning from time-changing data with adaptive windowing. In Proceedings of the 2007 SIAM international conference on data mining (pp. 443-448). Society for Industrial and Applied Mathematics.
