hermiter

What does hermiter do?

hermiter is an R package that facilitates the estimation of the probability density function and cumulative distribution function in univariate and bivariate settings using Hermite series based estimators. In addition, hermiter allows the estimation of the quantile function in the univariate case and nonparametric correlation coefficients in the bivariate case. The package is applicable to streaming, batch and grouped data. The core methods of the package are written in C++ for speed.

These estimators are particularly useful in the sequential setting (both stationary and non-stationary data streams). In addition, they are useful in efficient, one-pass batch estimation which is particularly relevant in the context of large data sets. Finally, the Hermite series based estimators are applicable in decentralized (distributed) settings in that estimators formed on subsets of the data can be consistently merged. The Hermite series based estimators have the distinct advantage of being able to estimate the full density function, distribution function and quantile function (univariate setting) along with the Spearman Rho and Kendall Tau correlation coefficients (bivariate setting) in an online manner. The theoretical and empirical properties of most of these estimators have been studied in-depth in the articles below. The investigations demonstrate that the Hermite series based estimators are particularly effective in distribution function, quantile function and Spearman correlation estimation.

A summary of the estimators and algorithms in hermiter can be found in the article below.

Stephanou, Michael and Varughese, Melvin. "hermiter: R package for Sequential Nonparametric Estimation." Computational Statistics (2023)

Features

Univariate

fast batch estimation of pdf, cdf and quantile function
consistent merging of estimates
fast sequential estimation of pdf, cdf and quantile function on streaming data
adaptive sequential estimation on non-stationary streams via exponential weighting
provides online, O(1) time complexity estimates of arbitrary quantiles e.g. median at any point in time along with probability densities and cumulative probabilities at arbitrary x
uses small and constant memory for the estimator
provides a very compact, simultaneous representation of the pdf, cdf and quantile function that can be efficiently stored and communicated using e.g. saveRDS and readRDS functions

Bivariate

fast batch estimation of bivariate pdf, cdf and nonparametric correlation coefficients (Spearman Rho and Kendall Tau)
consistent merging of estimates
fast sequential estimation of bivariate pdf, cdf and nonparametric correlation coefficients on streaming data
adaptive sequential estimation on non-stationary bivariate streams via exponential weighting
provides online, O(1) time complexity estimates of bivariate probability densities and cumulative probabilities at arbitrary points, x
provides online, O(1) time complexity estimates of the Spearman and Kendall rank correlation coefficients
uses small and constant memory for the estimator

Installation

The release version of hermiter can be installed from CRAN with:

install.packages("hermiter")

The development version of hermiter can be installed using devtools with:

devtools::install_github("MikeJaredS/hermiter")

Load Package

In order to utilize the hermiter package, the package must be loaded using the following command:

library(hermiter)

Construct Estimator

A hermite_estimator S3 object is constructed as below. The argument, N, adjusts the number of terms in the Hermite series based estimator and controls the trade-off between bias and variance. A lower N value implies a higher bias but lower variance and vice versa for higher values of N. The argument, standardize, controls whether or not to standardize observations before applying the estimator. Standardization usually yields better results and is recommended for most estimation settings.

A univariate estimator is constructed as follows (note that the default estimator type is univariate, so this argument does not need to be explicitly set):

hermite_est <- hermite_estimator(N=10, standardize=TRUE, 
                                 est_type = "univariate")

Similarly for constructing a bivariate estimator:

hermite_est <- hermite_estimator(N=10, standardize=TRUE, 
                                 est_type = "bivariate")

Batch Estimator Updating

A hermite_estimator object can be initialized with a batch of observations as below.

For univariate observations:

observations <- rlogis(n=1000)
hermite_est <- hermite_estimator(N=10, standardize=TRUE, observations = 
                                   observations)

For bivariate observations:

observations <- matrix(data = rnorm(2000),nrow = 1000, ncol=2)
hermite_est <- hermite_estimator(N=10, standardize=TRUE, 
                                 est_type = "bivariate", observations = 
                                   observations)

Sequential Estimator Updating

In the sequential setting, observations are revealed one at a time. A hermite_estimator object can be updated sequentially with a single new observation by utilizing the update_sequential method. Note that when updating the Hermite series based estimator sequentially, observations are also standardized sequentially if the standardize argument is set to true in the constructor.

Standard syntax

For univariate observations:

observations <- rlogis(n=1000)
hermite_est <- hermite_estimator(N=10, standardize=TRUE)
for (idx in seq_along(observations)) {
  hermite_est <- update_sequential(hermite_est,observations[idx])
}

For bivariate observations:

observations <- matrix(data = rnorm(2000),nrow = 1000, ncol=2)
hermite_est <- hermite_estimator(N=10, standardize=TRUE, 
                                 est_type = "bivariate")
for (idx in seq_len(nrow(observations))) {
  hermite_est <- update_sequential(hermite_est,observations[idx,])
}

Piped syntax

For univariate observations:

observations <- rlogis(n=1000)
hermite_est <- hermite_estimator(N=10, standardize=TRUE)
for (idx in seq_along(observations)) {
  hermite_est <- hermite_est %>% update_sequential(observations[idx])
}

For bivariate observations:

observations <- matrix(data = rnorm(2000),nrow = 1000, ncol=2)
hermite_est <- hermite_estimator(N=10, standardize=TRUE, 
                                 est_type = "bivariate")
for (idx in seq_len(nrow(observations))) {
  hermite_est <- hermite_est %>% update_sequential(observations[idx,])
}

Merging Hermite Estimators

Hermite series based estimators can be consistently combined/merged in both the univariate and bivariate settings. In particular, when standardize = FALSE, the results obtained from combining/merging distinct hermite_estimators updated on subsets of a data set are exactly equal to those obtained by constructing a single hermite_estimator and updating on the full data set (corresponding to the concatenation of the aforementioned subsets). This holds true for the pdf, cdf and quantile results in the univariate case and the pdf, cdf and nonparametric correlation results in the bivariate case. When standardize = TRUE, the equivalence is no longer exact, but is accurate enough to be practically useful. Combining/merging hermite_estimators is illustrated below.

For the univariate case:

observations_1 <- rlogis(n=1000)
observations_2 <- rlogis(n=1000)
hermite_est_1 <- hermite_estimator(N=10, standardize=TRUE, 
                                   observations = observations_1)
hermite_est_2 <- hermite_estimator(N=10, standardize=TRUE, 
                                   observations = observations_2)
hermite_est_merged <- merge_hermite(list(hermite_est_1,hermite_est_2))

For the bivariate case:

observations_1 <- matrix(data = rnorm(2000),nrow = 1000, ncol=2)
observations_2 <- matrix(data = rnorm(2000),nrow = 1000, ncol=2)
hermite_est_1 <- hermite_estimator(N=10, standardize=TRUE, 
                                 est_type = "bivariate", 
                                 observations = observations_1)
hermite_est_2 <- hermite_estimator(N=10, standardize=TRUE, 
                                 est_type = "bivariate", 
                                 observations = observations_2)
hermite_est_merged <- merge_hermite(list(hermite_est_1,hermite_est_2))

The ability to combine/merge estimators is particularly useful in applications involving grouped data (see package vignette).

Estimate univariate pdf, cdf and quantile function

The central advantage of Hermite series based estimators is that they can be updated in a sequential/one-pass manner a

Hermiter

Install / Use

README

hermiter

What does hermiter do?

Features

Univariate

Bivariate

Installation

Load Package

Construct Estimator

Batch Estimator Updating

Sequential Estimator Updating

Standard syntax

Piped syntax

Merging Hermite Estimators

Estimate univariate pdf, cdf and quantile function