HDGCvar

Granger causality testing in High Dimensional Vector Autoregressive Models

Generate Convert Improve

Install / Use

/learn @Marga8/HDGCvar

About this skill

Quality Score

0/100

README

HDGCvar

HDGCvar allows for testing Granger causality in High Dimensional Vector Autoregressive Models (VARs). Granger causality can be tested between time series that are stationary (HDGC_VAR_I0), non stationary (unit root), cointegrated, or all the above (HDGC_VAR). Bivariate as well as multivariate (i.e. blocks) causality can be considered by specifying the name(s) of the variable(s) of interest in GCpair (or GCpairs) and networks can be plotted to visualize the causal structure among several variables using Plot_GC_all.

A specific part of HDGCvar is dedicated to Realized Volatilities (RV), thus using the Heterogeneous VAR (HVARs). It gives the possibiity of building RV spillover networks (HDGC_HVAR_all) as well as conditioning RV on Realized Correlations (HDGC_HVAR_RV_RCoV_all).

Installation

You can install the released version of HDGCvar from CRAN with:

install.packages("HDGCvar")

And the development version from GitHub with:

# install.packages("devtools")
devtools::install_github("Marga8/HDGCvar")

All the functions in HDGCvar are based on the following two papers:

A. Hecq, L. Margaritella, S.Smeekes, “Granger Causality Testing in High Dimensional VARs: a Post Double Selection Procedure”(2019, https://arxiv.org/abs/1902.10991 )
A. Hecq, L. Margaritella, S.Smeekes, “Inference in Non Stationary High Dimensional VARs” (2020, check the latest version at https://sites.google.com/view/luca-margaritella ).

Details

If the addition of variable X to the given information set Ω alters the conditional distribution of another variable Y, and both X and Ω are observed prior to Y, then X improves predictability of Y, and is said to Granger cause Y with respect to Ω.

Statistically assessing the predictability among two (or blocks of) time series turns out to be a fundamental concept in modern time series analysis. Its applications range over from macroeconomics, finance, network theory and even the neurosciences.

With the increased availability of larger datasets, these causality concepts have been extended to the high-dimensional setting where they can benefit from the inclusion of many more series within the available information set Ω. Note how including as much information as possible is particularly tending to the original idea of Granger (1969) who envisioned Ω to be “all the information in the universe”, in order to avoid spurious causality.

Conditioning on so many variables comes at a cost, i.e. quickly running into problems of high-dimensionality: sky-rocketing variance, overfitting and in general invalidity of standard statistical techniques.

A. Hecq, L. Margaritella, S.Smeekes (2019), designed a Granger causality LM test for high-dimensional stationary vector autoregressive models (VAR) which combine dimensionality reduction techniques based on penalized regressions such as the lasso of Tibshirani (1996), with the post-double-selection procedure of Belloni et al. (2011) to select the set of relevant covariates in the system. The double selection step allows to substantially reduce the omitted variable bias and thereby allowing for valid post-selection inference on the parameters.

If you are sure that your time series are stationary, you can go ahead and use one of the following functions from HDGCvar:

HDGC_VAR_I0 which tests for Granger causality in High Dimensional Stationary VARs
HDGC_VAR_multiple_I0 or HDGC_VAR_multiple_pairs_I0 which test multiple combinations Granger causality in High Dimensional Stationary VARs
HDGC_VAR_all_I0 which allows you to test all the bivariate combinations in you dataset and hence to build a Granger causality network.

All these functions ask you as inputs:

GCpair or GCpairs, a list object explicitly containing the Granger caused variables “GCto” and the Granger causing “GCfrom”.
data , the dataset containing as columns all the time series you are modeling.
p = 1, the lag-length of the VAR. Unless you want to explicitly impose it, you can use HDGCvar::lags_upbound_BIC() which estimates an empirical upper bound for the lag-length using Bayesian Information Criteria (for details see Hecq et al. 2019).
bound = 0.5 * nrow(data), this is meant for applying a lower bound on the penalty parameter of the lasso. In simple words it means you are telling the lasso, in each equations it estimates, to please stop when it has reached an amount of selected (i.e. non-zero coefficient) variables corresponding to (50%) of the sample size. This is innocuous in small systems but quite important for systems where the number of variables per-equation (i.e. lags included) is larger than the sample size (for details see Hecq et al. 2019).
parallel = FALSE, put equal to TRUE for parallel computing.
n_cores = NULL, number of cores to use in parallel computing, default is all but one.

A. Hecq, L. Margaritella, S.Smeekes (2020) extended the Post-double-selection Granger causality LM test of Hecq et al. (2019) to the case in which your system might contain time series integrated of arbitrary orders and possibly even cointegrated. To accomplish this is necessary to simply augment the lags of the variables of interest by the maximum order of integration we suspect the series having. This allows to completely avoid any biased pre-test of unit-root or cointegration and directly perform high-dimensional inference on the desired parameters.

Therefore: if you are NOT sure whether your time series are stationary, non-stationary, cointegrated, or an appropriate mix of the above or you do not trust the biased unit root and cointegration tests out there, we got you covered! you can go ahead and use one of the following functions from HDGCvar:

HDGC_VAR which tests for Granger causality in High Dimensional Stationary/Non-Stationary/cointegrated or a mix of the above, VARs
HDGC_VAR_multiple or HDGC_VAR_multiple_pairs which test multiple combinations Granger causality in High Dimensional Stationary/Non-Stationary/cointegrated or a mix of the above, VARs
HDGC_VAR_all which allows you to test all the bivariate combinations in you dataset and hence to build a Granger causality network.

All these functions ask you the same inputs as reported above for HDGC_VAR_I0 with the addition of the following:

d=2, is the augmentation needed to handle the potentially non-stationary time series. It should correspond to the maximum suspected order of integration of your time series. This might sound vague but think about it: in economics you will hardly see a variable integrated of order two or three. Therefore, unless you are absolutely sure that your time series are at most integrated of order one and those not integrated have eigenvalues far from the unity or that for whatever unlikely reason you have at least one variable integrated of order three, I would recommend to put d=2. Be careful though: if the lag-length p that you are using is smaller or equal to d, HDGC_VAR will give you a warning: to avoid spurious regression issues in the post-double-selection step you need to ensure p>=d+1. This ulimnately means that if you want to be safeguarded from possible I(2) variables you will have to put d=2 and at least p=3. In other words, even though HDGCvar::lags_upbound_BIC() might have estimated p=2, you still want to make it larger to avoid spurious results in the lasso selection.

Back to A. Hecq, L. Margaritella, S.Smeekes (2019), in their empirical application they considered a specific case of stationary time series, namely they investigate the volatility transmission in stock return prices using the daily realized variances. These are very attractive measures among practitioners and academics for modelling time varying volatilities and monitoring financial risk. Given the time series of realized volatilities, a multivariate version of the heterogeneous autoregressive model (HVAR) of Corsi (2009) is employed to model their joint behavior.

Therefore: if you are interested in testing Granger causality among realized volatilities and potentially build volatility spillover networks you should use one of the following function from HDGCvar:

HDGC_HVAR which tests for Granger causality in High Dimensional stationary HVARs.
HDGC_HVAR_multiple or HDGC_VAR_multiple_pairs which test multiple combinations Granger causality in High Dimensional stationary HVARs.
HDGC_HVAR_all which allows you to test all the bivariate combinations in you dataset and hence to build a Granger causality network.

All these functions ask you the same inputs as reported above for HDGC_VAR_I0 with the inclusion of log=TRUE to log-transform the series (recommended) and the exclusion of the lag-length parameter p as in a HVAR this is already pre-specified to be equal to 3, namely the daily, weekly, monthly aggregates of realized volatilities (for details see Corsi (2009) and Hecq et al. (2019)).

Note: to account for the potential heteroskedasticity in the data, all these functions returns the Asymptotic χ(^2) version of the LM test (“Asymp”), the small sample correction F tests (“FS_cor”) and the

Related Skills

node-connect

350.8k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

110.4k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

350.8k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

qqbot-media

350.8k

QQBot 富媒体收发能力。使用 <qqmedia> 标签，系统根据文件扩展名自动识别类型（图片/语音/视频/文件）。