PINstimation
A comprehensive bundle of utilities for the estimation of probability of informed trading models: original PIN in Easley and O'Hara (1992) and Easley et al. (1996); Multilayer PIN (MPIN) in Ersan (2016); Adjusted PIN (AdjPIN) in Duarte and Young (2009); and volume-synchronized PIN (VPIN) in Easley et al. (2011, 2012). Implementations of various estimation methods suggested in the literature are included. Additional compelling features comprise posterior probabilities, an implementation of an expectation-maximization (EM) algorithm, and PIN decomposition into layers, and into bad/good components. Versatile data simulation tools, and trade classification algorithms are among the supplementary utilities. The package provides fast, compact, and precise utilities to tackle the sophisticated, error-prone, and time-consuming estimation procedure of informed trading, and this solely using the raw trade-level data.
Install / Use
/learn @monty-se/PINstimationREADME
PINstimation: Estimating Models of Probability of Informed Trading <img src="man/figures/small_logo.png" width="140" height="140" align="right" />
PINstimation provides utilities for the estimation of probability of informed trading models: original PIN (PIN) in Easley and O'Hara (1992) and Easley et al. (1996); multilayer PIN (MPIN) in Ersan (2016); Adjusted PIN (AdjPIN) in Duarte and Young (2009); and volume- synchronized PIN (VPIN) in Easley et al. (2011, 2012); and an improved VPIN implemented (iVPIN), which follows Lin and Ke (2017). Various computation methods suggested in the literature are included. Data simulation tools and trade classification algorithms are among the supplementary utilities. The package enables fast and precise solutions for the sophisticated, error-prone and time-consuming estimation procedure of the probability of informed trading measures, and it is compact in the sense detailed estimation results can be achieved by solely the use of raw trade level data.
Recent changes (0.2.0)
initials_adjpin(): Updated the generation of initial parameter sets for the adjusted PIN model to align with the procedure described in Ersan and Ghachem (2024).adjpin(): The reported runtime now includes the time spent generating initial parameter sets, giving a more complete view of total computation time.ivpin(): Added an improved VPIN estimator based on Ke and Lin (2017). Using maximum-likelihood estimation,ivpin()provides more stable VPIN estimates—especially with small volume buckets or infrequent informed trades—by capturing information embedded in volume time, yielding more consistent measures of flow toxicity.
See NEWS.md for the full version history.
Table of contents
<!--ts-->- Main functionalities
- Installation
- Examples
- Resources
- Note to frequent users
- Contributions
- Alternative packages
- Getting help
Main functionalities
The functionalities that the package offers are summarized below:
-
PIN model
- estimate the PIN model using the functions
pin(),pin_yz(),pin_gwj(), andpin_ea(). - compute initial parameter sets using the functions
initials_pin_yz(),initials_pin_gwj(), andinitials_pin_ea(). - generate simulation data following the PIN model using
generatedata_mpin(layers=1). - evaluate factorizations of the PIN likelihood functions using
fact_pin_eho(),fact_pin_lk(),fact_pin_e(). - estimate the PIN model by the Bayesian approach (Gibbs Sampler) using
pin_bayes()(*) .
- estimate the PIN model using the functions
-
MPIN model
- estimate the MPIN model using the functions
mpin_ml()andmpin_ecm(). - compute initial parameter sets using
initials_mpin(). - detect the number of layers in data using
detectlayers_e(),detectlayers_eg(), anddetectlayers_ecm(). - generate simulation data following the MPIN model using
generatedata_mpin(). - evaluate the factorization of the MPIN likelihood function through
fact_mpin().
- estimate the MPIN model using the functions
-
AdjPIN model
- estimate the AdjPIN model using the function
adjpin(). - compute initial parameter sets using functions
initials_adjpin(),initials_adjpin_cl(), andinitials_adjpin_rnd(). - generate simulation data following the AdjPIN model using
generatedata_adjpin(). - evaluate the factorization of the AdjPIN likelihood function through
fact_adjpin().
- estimate the AdjPIN model using the function
-
VPIN
- estimate the VPIN model using the function
vpin() - estimate the iVPIN model using the function
ivpin()
- estimate the VPIN model using the function
-
Data classification
- Classify high-frequency data through
tick,quote,LRandEMOalgorithms using the functionaggregate_trades()
- Classify high-frequency data through
Installation
The easiest way to get PINstimation is the following:
install.packages("PINstimation")
To get a bugfix or to use a feature from the development version, you can install the development version of PINstimation from GitHub.
# install.packages("devtools")
# library(devtools)
devtools::install_github("monty-se/PINstimation", build_vignettes = TRUE)
Loading the package
library(PINstimation)
Examples
Example 1: Estimate the PIN model
We estimate the PIN model on preloaded dataset dailytrades using the initial parameter sets of Ersan & Alici (2016).
estimate <- pin_ea(dailytrades)
## [+] PIN Estimation started
## |[1] Likelihood function factorization: Ersan (2016)
## |[2] Loading initial parameter sets : 5 EA initial set(s) loaded
## |[3] Estimating PIN model (1996) : Using Maximum Likelihood Estimation
## |+++++++++++++++++++++++++++++++++++++| 100% of PIN estimation completed
## [+] PIN Estimation completed
Example 2: Estimate the Multilayer PIN model
We run the estimation of the MPIN model on preloaded dataset dailytrades using:
- the maximum-likelihood method.
ml_estimate <- mpin_ml(dailytrades)
## [+] MPIN estimation started
## |[1] Detecting layers from data : using Ersan and Ghachem (2022a)
## |[=] Number of layers in the data : 3 information layer(s) detected
## |[2] Computing initial parameter sets : using algorithm of Ersan (2016)
## |[3] Estimating the MPIN model : Maximum-likelihood standard estimation
## |+++++++++++++++++++++++++++++++++++++| 100% of mpin estimation completed
## [+] MPIN estimation completed
- the ECM algorithm.
ecm_estimate <- mpin_ecm(dailytrades)
## [+] MPIN estimation started
## |[1] Computing the range of layers : information layers from 1 to 8
## |[2] Computing initial parameter sets : using algorithm of Ersan (2016)
## |[=] Selecting initial parameter sets : max 100 initial sets per estimation
## |[3] Estimating the MPIN model : Expectation-Conditional Maximization algorithm
## |+++++++++++++++++++++++++++++++++++++| 100% of estimation completed [8 layer(s)]
## |[3] Selecting the optimal model : using lowest Information Criterion (BIC)
## [+] MPIN estimation completed
Compare the aggregate parameters obtained from the ML, and ECM estimations.
mpin_comparison <- rbind(ml_estimate@aggregates, ecm_estimate@aggregates)
rownames(mpin_comparison) <- c("ML", "ECM")
cat("Probabilities of ML, and ECM estimations of the MPIN model\n")
print(mpin_comparison)
Display the summary of the model estimates for all number of layers.
summary <- getSummary(ecm_estimate)
show(summary)
## layers em.layers MPIN Likelihood AIC BIC AWE
## Model[1] 1 1 0.566 -3226.469 6462.9 6473.4 6508.9
## Model[2] 2 2 0.577 -800.379 1616.8 1633.5 1690.3
## Model[3] 3 3 0.574 -643.458 1308.9 1332.0 1410.0
## Model[4] 4 3 0.574 -643.458 1308.9 1332.0 1410.0
## Model[5] 5 3 0.574 -643.458 1308.9 1332.0 1410.0
## Model[6] 6 3 0.574 -643.458 1308.9 1332.0 1410.0
## Model[7] 7 4 0.575 -642.631 1313.3 1342.6 1441.9
## Model[8] 8 4 0.575 -642.631 1313.3 1342.6 1441.9
Example 3: Estimate the Adjusted PIN model
We estimate the adjusted PIN model on preloaded dataset dailytrades using 20 initial parameter sets computed by the algorithm of Ersan and Ghachem (2022b).
estimate_adjpin <- adjpin(dailytrades, initialsets = "GE")
show(estimate_adjpin)
## [+] AdjPIN estimation started
## |[1] Computing initial parameter sets : 20 GE initial sets generated
## |[2] Estimating the AdjPIN model : Maximum-likelihood Standard Estimation
## |+++++++++++++++++++++++++++++++++++++| 100% of AdjPIN estimation completed
## [+] AdjPIN estimation completed
Example 4: Estimate the Volume-adjusted PIN model
We run a VPIN estimation on preloaded dataset hfdata with timebarsize of 5 minutes (300 seconds).
estimate.vpin <- vpin(hfdata, timebarsize = 300)
show(estimate.vpin)
## ----------------------------------
## VPIN estimation completed successfully.
## ----------------------------------
## Type object@vpin to access the VPIN vector.
## Type object@bucketdata to access data used to construct the VPIN vector.
## Type object@dailyvpin to access the daily VPIN vectors.
##
## [+] VPIN descriptive statistics
##
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## ------- --------- -------- ------- --------- ------- ------
## 0.129 0.208 0.252 0.269 0.319 0.643 49
##
##
## [+] VPIN parameters
##
## tbSize buckets samplength VBS ndays
## -------- --------- ------------ ---------- -------
## 300 50 50 4058.956 69
##
## -------
## Running time: 2.46 seconds
