Xtbreak
Testing and Estimation of structural breaks in Stata
Install / Use
/learn @JanDitzen/XtbreakREADME
xtbreak
estimating and testing for many known and unknown structural breaks in time series and panel data.
For an overview of xtbreak test see xtbreak test and for xtbreak estimate see xtbreak estimate.
Current Version:
Please cite as Ditzen, J., Karavias, Y. & Westerlund, J. (2025) Testing and Estimating Structural Breaks in Time Series and Panel Data in Stata. The Stata Journal. 25(3). download.
A paper describing the panel data theory of xtbreak is available as Ditzen, J., Karavias, Y. & Westerlund, J. (2024) Multiple Structural Breaks in Interactive Effects Panel Data Models. Journal of Applied Econometrics. 40(1). download.
Table of Contents
- Syntax
- Description
- Options
- Note on Panel Data
- Python
- Unbalanced Data
- Examples
- References
- How to install
- Questions?
- About
- Changes
1. Syntax
Automatic estimation of number and location of break (sequential F-Test)
xtbreak depvar [indepvars] [if],
options1 options2 options3 options5 options6
Testing for known structural breaks:
xtbreak test depvar [indepvars] [if],
breakpoints(numlist| datelist [,index| fmt(string)]) options1 options5
breakpoints() specifies the time period of the known structural break.
Testing for unknown structural breaks:
xtbreak test depvar [indepvars] [if],
hypothesis(1|2|3) breaks(#) options1 options2 options3 options4 options5
hypothesis(1\2\3) specifies which hypothesis to test, see hypothesises. breaks(#) sets the number of breaks.
Estimation of breakpoints
xtbreak estimate depvar [indepvars] [if], breaks(#) showindex options1 options2 options5
General Options
options1 | Description --- | --- breakconstant | break in constant noconstant | suppresses constant nobreakvariables(varlist1) | variables with no structural break(s) vce(type) | covariance matrix estimator, allowed: ssr, hac, hc and np inverter(type) inverter, default is speed. See options. python use Python to calculated SSRs to improve speed. See details.
Options for unknown breakdates
options2 | Description --- | --- trimming(real) | minimal segment length error(real) | error margin for partial break model
Options for testing with unknown breakdates and hypothesis(2)
options3 | Description --- | --- wdmax | Use weighted test statistic instead of unweighted level(#) | set level for critical values
Options for testing with unknown breakdates and hypothesis(3)
options4 | Description --- | --- sequential | Sequential F-Test to obtain number of breaks
Options for panel data
options5 | Description --- | --- nofixedeffects | suppresses fixed effects (only for panel data sets) breakfixedeffects | break in fixed effects csd | add cross-section averages of variables with and without breaks. csa(varlist) | Variables with breaks used to calculate cross-sectional averages csanobreak(varlist) | Variables without breaks used to calculate cross-sectional averages kfactors(varlist) | Known factors, which are constant across the cross-sectional dimension but are affected by structural breaks. Examples are seasonal dummies or other observed common factors such as asset returns and oil prices. nbkfactors(varlist) | same as above but without breaks. noreweigh do not reweigh time-unit specific errors by the number of total observations over actual observations for a given time period in order to increase the SSR of segments of unbalanced panels with missing data.
Options for automatic estimation of number and location of break
options6 | Description --- | --- skiph2 | skips hypothesis B clevel(#) | specifies level for critical values to detect breaks. strict | strict behavior of sequential test. Improves speed. maxbreaks(#) | sets maximum number of breaks for sequential test. Improves speed.
Data has to be xtset before using xtbreak. depvars, indepvars and varlist1, varlist2 may contain time-series operators.
2. Description
xtbreak test implements multiple tests for structural breaks in time series and panel data models. The number and period of occurrence of structural breaks can be known and unknown. In the case of a known breakpoint xtbreak test can test if the break occurs at a specific point in time. For unknown breaks, xtbreak test implements three different hypothesis. The first is no break against the alternative of s breaks, the second hypothesis is no breaks against a lower and upper limit of breaks. The last hypothesis tests the null of s breaks against the alternative of one more break (s+1). For more details see xtbreak test.
xtbreak estimate estimates the break points, that is, it estimates T1, T2, ..., Ts. The underlying idea is that if the model with the true breakdates given a number of breaks has a smaller sum of squared residuals (SSR) than a model with incorrect breakdates. To find the breakdates, xtbreak estimate uses the algorithm (dynamic program) from Bai and Perron (2003). All necessary SSRs are calculated and then the smallest one selected. For more details see xtbreak estimate.
xtbreak implements the tests for and estimation of structural breaks discussed in Bai & Perron (1998, 2003), Karavias, Narayan, Westerlund (2021) and Ditzen, Karavias, Westerlund (2024).
For the remainder we assume the following model:
y(i,t) = sigma0(1) + sigma1(1) z(i,t) + beta0(1,i) + beta1 x(i,t) + e(it) for t = 1,...,T1
y(i,t) = sigma0(2) + sigma1(2) z(i,t) + beta0(1,i) + beta1 x(i,t) + e(it) for t = T1+1,...,T2
...
y(i,t) = sigma0(s) + sigma1(s) z(i,t) + beta0(1,i) + beta1 x(i,t) + e(it) for t = Ts,...,T
where s is the number of the segment/breaks, z(i,t) is a NT1xq matrix containing the variables whose relationship with y breaks. A break in the constant is possible. x(i,t) is a NTxp matrix with variables without a break. sigma0(s), sigma1(s) are the coefficients with structural breaks and T1,...,Ts are the periods of the breakpoints.
In pure time series model breaks in the constant (or deterministics) are possible. In this case sigma0(s) is a constant with a structural break. Fixed effects in panel data models cannot have a break.
xtbreak will automatically determine whether a time series or panel dataset is used.
3. Options
Options
Option | Description
--- | ---
breakpoints(numlist\datelist [,index\fmt(format)]) | specifies the known breakpoints. Known breakpoints can be set by either the number of observation or by the value of the time identifier. If a numlist is used, option index is required. For example breakpoints(10,index) specifies that the one break occurs at the 10th observation in time. datelist takes a list of dates. For example breakpoints(2010Q1) , fmt(tq) specifies a break in Quarter 1 in 2010. The option fmt() specifies the format and is required if a datelist is used. The format set in breakpoints() and the time identifier needs to be the same.
breaks(#) | specifies the number of unknwon breaks under the alternative. For hypothesis 2, breaks() can take two values, for example breaks(4 6) test for no breaks against 4-6 breaks. If only one value specfied, then the lower limit is set to 1. If h(3) and sequential is used, then breaks() defines the maximum number of breaks tested for.
showindex | show confidence intervals as index.
hypothesis(1\2\3) | specifies which hypothesis to test. h(1) test for no breaks vs. s breaks, h(2) for no break vs. s0 <= s <= s1 breaks and h(3) for s vs. s+1 breaks. Hypothesis 3 is the default.
sequential | sequential F-Test to determine number of breaks. The number of breaks is varied from s = 0 to breaks()-1 or floor(1/trimming).
breakconstant | break in constant. Default is no breaks in deterministics.
noconstant | suppresses constant.
nofixedeffects | suppresses individual fixed effects.
breakfixedeffects | break in fixed effects.
nobreakvariables(varlist1) | defines variables with no structural break(s). varlist1 can contain time series operators.
vce(type) | covariance matrix estimator, allowed: ssr, hac, hc and np.
trimming(real) | minimal segment length in percent. The minimal segment length is the minimal time periods between two breaks. The default is 15% (0.15). Critical values are available for %5, 10%, 15%, 20% and 25%.
error(real) | define error margin for partial break model.
wdmax | Use weighted test statistic instead of unweighted for the double maximum test (hypotheis 2).
level(#) | set level for critical values for weighted double maximum test. If a value is choosen for which no critical values exists, xtbreak test will choose the closest level.
csd | adds cross-section averages of variables with and without breaks.
csa(varlist) | specify the variables with and without breaks which are added as cross-sectional averages. xtbreak calculates internally the cross-sectional average.
csanobreak() | same as **csa()
