SaMUraiS

StAtistical Models for the UnsupeRvised segmentAion of tIme-Series

Generate Convert Improve

Install / Use

/learn @fchamroukhi/SaMUraiS

About this skill

Quality Score

0/100

README

SaMUraiS: StAtistical Models for the UnsupeRvised segmentAtIon of time-Series

samurais is an open source toolbox (available in R and in Matlab) including many original and flexible user-friendly statistical latent variable models and unsupervised algorithms to segment and represent, time-series data (univariate or multivariate), and more generally, longitudinal data which include regime changes.

Our samurais use mainly the following efficient "sword" packages to segment data: Regression with Hidden Logistic Process (RHLP), Hidden Markov Model Regression (HMMR), Piece-Wise regression (PWR), Multivariate 'RHLP' (MRHLP), and Multivariate 'HMMR' (MHMMR).

The models and algorithms are developed and written in Matlab by Faicel Chamroukhi, and translated and designed into R packages by Florian Lecocq, Marius Bartcus and Faicel Chamroukhi.

Installation

You can install the samurais package from GitHub with:

# install.packages("devtools")
devtools::install_github("fchamroukhi/SaMUraiS")

To build vignettes for examples of usage, type the command below instead:

# install.packages("devtools")
devtools::install_github("fchamroukhi/SaMUraiS", 
                         build_opts = c("--no-resave-data", "--no-manual"), 
                         build_vignettes = TRUE)

Use the following command to display vignettes:

browseVignettes("samurais")

Usage

library(samurais)

# Application to a toy data set
data("univtoydataset")
x <- univtoydataset$x
y <- univtoydataset$y

K <- 5 # Number of regimes (mixture components)
p <- 3 # Dimension of beta (order of the polynomial regressors)
q <- 1 # Dimension of w (order of the logistic regression: to be set to 1 for segmentation)
variance_type <- "heteroskedastic" # "heteroskedastic" or "homoskedastic" model

n_tries <- 1
max_iter = 1500
threshold <- 1e-6
verbose <- TRUE
verbose_IRLS <- FALSE

rhlp <- emRHLP(X = x, Y = y, K, p, q, variance_type, n_tries, 
               max_iter, threshold, verbose, verbose_IRLS)
#> EM: Iteration : 1 || log-likelihood : -2119.27308534609
#> EM: Iteration : 2 || log-likelihood : -1149.01040321999
#> EM: Iteration : 3 || log-likelihood : -1118.20384281234
#> EM: Iteration : 4 || log-likelihood : -1096.88260636121
#> EM: Iteration : 5 || log-likelihood : -1067.55719357295
#> EM: Iteration : 6 || log-likelihood : -1037.26620122646
#> EM: Iteration : 7 || log-likelihood : -1022.71743069484
#> EM: Iteration : 8 || log-likelihood : -1006.11825447077
#> EM: Iteration : 9 || log-likelihood : -1001.18491883952
#> EM: Iteration : 10 || log-likelihood : -1000.91250763556
#> EM: Iteration : 11 || log-likelihood : -1000.62280600209
#> EM: Iteration : 12 || log-likelihood : -1000.3030988811
#> EM: Iteration : 13 || log-likelihood : -999.932334880131
#> EM: Iteration : 14 || log-likelihood : -999.484219706691
#> EM: Iteration : 15 || log-likelihood : -998.928118038989
#> EM: Iteration : 16 || log-likelihood : -998.234244664472
#> EM: Iteration : 17 || log-likelihood : -997.359536276056
#> EM: Iteration : 18 || log-likelihood : -996.152654857298
#> EM: Iteration : 19 || log-likelihood : -994.697863447307
#> EM: Iteration : 20 || log-likelihood : -993.186583974542
#> EM: Iteration : 21 || log-likelihood : -991.81352379631
#> EM: Iteration : 22 || log-likelihood : -990.611295217008
#> EM: Iteration : 23 || log-likelihood : -989.539226273251
#> EM: Iteration : 24 || log-likelihood : -988.55311887915
#> EM: Iteration : 25 || log-likelihood : -987.539963690533
#> EM: Iteration : 26 || log-likelihood : -986.073920116541
#> EM: Iteration : 27 || log-likelihood : -983.263549878169
#> EM: Iteration : 28 || log-likelihood : -979.340492188909
#> EM: Iteration : 29 || log-likelihood : -977.468559852711
#> EM: Iteration : 30 || log-likelihood : -976.653534236095
#> EM: Iteration : 31 || log-likelihood : -976.5893387433
#> EM: Iteration : 32 || log-likelihood : -976.589338067237

rhlp$summary()
#> ---------------------
#> Fitted RHLP model
#> ---------------------
#> 
#> RHLP model with K = 5 components:
#> 
#>  log-likelihood nu       AIC       BIC       ICL
#>       -976.5893 33 -1009.589 -1083.959 -1083.176
#> 
#> Clustering table (Number of observations in each regimes):
#> 
#>   1   2   3   4   5 
#> 100 120 200 100 150 
#> 
#> Regression coefficients:
#> 
#>       Beta(K = 1) Beta(K = 2) Beta(K = 3) Beta(K = 4) Beta(K = 5)
#> 1    6.031875e-02   -5.434903   -2.770416    120.7699    4.027542
#> X^1 -7.424718e+00  158.705091   43.879453   -474.5888   13.194261
#> X^2  2.931652e+02 -650.592347  -94.194780    597.7948  -33.760603
#> X^3 -1.823560e+03  865.329795   67.197059   -244.2386   20.402153
#> 
#> Variances:
#> 
#>  Sigma2(K = 1) Sigma2(K = 2) Sigma2(K = 3) Sigma2(K = 4) Sigma2(K = 5)
#>       1.220624      1.110243      1.079394     0.9779734      1.028332

rhlp$plot()

# Application to a real data set
data("univrealdataset")
x <- univrealdataset$x
y <- univrealdataset$y2

K <- 5 # Number of regimes (mixture components)
p <- 3 # Dimension of beta (order of the polynomial regressors)
q <- 1 # Dimension of w (order of the logistic regression: to be set to 1 for segmentation)
variance_type <- "heteroskedastic" # "heteroskedastic" or "homoskedastic" model

n_tries <- 1
max_iter = 1500
threshold <- 1e-6
verbose <- TRUE
verbose_IRLS <- FALSE

rhlp <- emRHLP(X = x, Y = y, K, p, q, variance_type, n_tries, 
               max_iter, threshold, verbose, verbose_IRLS)
#> EM: Iteration : 1 || log-likelihood : -3321.6485760125
#> EM: Iteration : 2 || log-likelihood : -2286.48632282875
#> EM: Iteration : 3 || log-likelihood : -2257.60498391374
#> EM: Iteration : 4 || log-likelihood : -2243.74506764308
#> EM: Iteration : 5 || log-likelihood : -2233.3426635247
#> EM: Iteration : 6 || log-likelihood : -2226.89953345319
#> EM: Iteration : 7 || log-likelihood : -2221.77999023589
#> EM: Iteration : 8 || log-likelihood : -2215.81305295291
#> EM: Iteration : 9 || log-likelihood : -2208.25998029539
#> EM: Iteration : 10 || log-likelihood : -2196.27872403055
#> EM: Iteration : 11 || log-likelihood : -2185.40049009242
#> EM: Iteration : 12 || log-likelihood : -2180.13934245387
#> EM: Iteration : 13 || log-likelihood : -2175.4276274402
#> EM: Iteration : 14 || log-likelihood : -2170.86113669353
#> EM: Iteration : 15 || log-likelihood : -2165.34927170608
#> EM: Iteration : 16 || log-likelihood : -2161.12419211511
#> EM: Iteration : 17 || log-likelihood : -2158.63709280617
#> EM: Iteration : 18 || log-likelihood : -2156.19846850913
#> EM: Iteration : 19 || log-likelihood : -2154.04107470071
#> EM: Iteration : 20 || log-likelihood : -2153.24544245686
#> EM: Iteration : 21 || log-likelihood : -2151.74944795242
#> EM: Iteration : 22 || log-likelihood : -2149.90781423151
#> EM: Iteration : 23 || log-likelihood : -2146.40042232588
#> EM: Iteration : 24 || log-likelihood : -2142.37530025533
#> EM: Iteration : 25 || log-likelihood : -2134.85493291884
#> EM: Iteration : 26 || log-likelihood : -2129.67399002071
#> EM: Iteration : 27 || log-likelihood : -2126.44739300481
#> EM: Iteration : 28 || log-likelihood : -2124.94603052064
#> EM: Iteration : 29 || log-likelihood : -2122.51637426267
#> EM: Iteration : 30 || log-likelihood : -2121.01493646146
#> EM: Iteration : 31 || log-likelihood : -2118.45402063643
#> EM: Iteration : 32 || log-likelihood : -2116.9336204919
#> EM: Iteration : 33 || log-likelihood : -2114.34424563452
#> EM: Iteration : 34 || log-likelihood : -2112.84844186712
#> EM: Iteration : 35 || log-likelihood : -2110.34494568025
#> EM: Iteration : 36 || log-likelihood : -2108.81734757025
#> EM: Iteration : 37 || log-likelihood : -2106.26527191053
#> EM: Iteration : 38 || log-likelihood : -2104.96591147986
#> EM: Iteration : 39 || log-likelihood : -2102.43927829964
#> EM: Iteration : 40 || log-likelihood : -2101.27820194404
#> EM: Iteration : 41 || log-likelihood : -2098.81151697567
#> EM: Iteration : 42 || log-likelihood : -2097.48008514591
#> EM: Iteration : 43 || log-likelihood : -2094.98259556552
#> EM: Iteration : 44 || log-likelihood : -2093.66517040802
#> EM: Iteration : 45 || log-likelihood : -2091.23625905564
#> EM: Iteration : 46 || log-likelihood : -2089.91118603989
#> EM: Iteration : 47 || log-likelihood : -2087.67388435026
#> EM: Iteration : 48 || log-likelihood : -2086.11373786756
#> EM: Iteration : 49 || log-likelihood : -2083.84931461869
#> EM: Iteration : 50 || log-likelihood : -2082.16175664198
#> EM: Iteration : 51 || log-likelihood : -2080.45137011098
#> EM: Iteration : 52 || log-likelihood : -2078.37066132008
#> EM: Iteration : 53 || log-likelihood : -2077.06827662071
#> EM: Iteration : 54 || log-likelihood : -2074.66718553694
#> EM: Iteration : 55 || log-likelihood : -2073.68137124781
#> EM: Iteration : 56 || log-likelihood : -2071.20390017789
#> EM: Iteration : 57 || log-likelihood : -2069.88260759288
#> EM: Iteration : 58 || log-likelihood : -2067.30246728287
#> EM: Iteration : 59 || log-likelihood : -2066.08897944236
#