Ritest
Stata command to perform randomization inference and permutation tests, allowing for arbitrary randomization procedures with (almost) any Stata command.
Install / Use
/learn @simonheb/RitestREADME
ritest
Stata package to perform randomization inference on any Stata command.
Install
To obtain the latest version through github, from the main window in Stata, run:
net describe ritest, from(https://raw.githubusercontent.com/simonheb/ritest/master/)
If the download from within Stata fails (e.g. because you are behind a firewall),you can always download the files directly:
- https://raw.githubusercontent.com/simonheb/ritest/master/ritest.ado
- https://raw.githubusercontent.com/simonheb/ritest/master/ritest.sthlp
Citation
ritest is not an offical State command. It is a piece of software I wrote and made freely available to the community. Please cite:
Heß, Simon, "Randomization inference with Stata: A guide and software" Stata Journal 17(3) pp. 630-651. [bibtex]
Bugs
There are no known bugs. Please report any unintended or surprising behavior.
Changelog
- 1.1.18 In the previous versions, the samplingsourcefile() got overwritten by a sorted version of itself. This is unnecessary and has been fixed now. (H/T Béatrice Leydier)
- 1.1.8 The savings()-option and the savererandomization()-/saveresampling()-option, now auto-appends the ".dta" and allows to specify ", replace".
- 1.1.4 Added a reject()-option, as permute has it. h/t George s. Ford
- 1.1.2 Fixed the issue that data sanity checks were applied to the full sample, even if an [if] or [in]-statement was used to restrict the analysis to a subsample. h/t Fred Finan
- 1.1.0 Added an option (fixlevels()) to constrain re-randomization to certain values of the treatment variable. This can be used for pairwise tests in multi-treatment experiments by restricting permutation to only some treatment arms.
- 1.0.9 Added the "strict" and "eps" options to the helpfile and added parameter checks so that "strict" enforces "eps(0)". h/t Katharina Nesselrode
- 1.0.8 Minor bugfix, and I got rid of the Google Analytics part
- 1.0.7 Kason Kerwin pointed out that when string variables were used as strata or cluster ids, all observations were treated as belonging to the same. This is fixed with this version. Also, I sped up execution time by dropping unneeded code.
- 1.0.6 Jason Kerwin pointed out an issue with the "saveresampling()"-option. This version fixes this.
- 1.0.4 David McKenzie pointed out that under some conditions, the random seed was ignored. This is fixed with this version.
- 1.0.3 Is the version that was published in the Stata Journal.
Media Coverage
- Finally, a way to do easy randomization inference in Stata! (blog post by David McKenzie)
- Simon Heß has a brand-new Stata package for randomization inference (blog post by Jason Kerwin)
Disclaimer of Warranties and Limitation of Liability
Use at own risk. You agree that use of this software is at your own risk. The author is optimistic but does not make any warranty as to the results that may be obtained from use of this software. The author would be very happy to hear about any issues you might find and will be transparent about changes made in response to user inquiries.
FAQ
- Exporting Results
- Call
marginsor other Pre/Post-Estimation Commands Beforeritest, Using Wrappers - Using
ritestwith a Difference-in-Differences Estimator - Multiple Treatment Arms
- Interpreting
ritestOutput - Confidence intervals
<a name="esttab"></a>How do I export ritest results to TeX/CSV/... with esttab/estout?
run ritest:
eststo regressionresult: reg y treatment controls
ritest treatment _b[treatment]: `e(cmdline)'
extract the RI p-values:
matrix pvalues=r(p) //save the p-values from ritest
mat colnames pvalues = treatment //name p-values so that esttab knows to which coefficient they belong
est restore regressionresult
display the p-values in the table footer:
estadd scalar pvalue_treat = pvalues[1,1]
esttab regressionresult, stats(pvalue_treat)
alternatively, display the p-values next to the coefficient
estadd matrix pvalues = pvalues
esttab regressionresult, cells(b p(par) pvalues(par([ ])))
<a name="wrapper"><a/>As a test statistic, I want to use something that requires several steps to be computed, instead of a simple coefficient estimate. E.g. a margins result.
Use a wrapper!
In most cases, randomization inference will be based on observing the same coefficient estimate across different realizations of a treatment assignment. Even in nonlinear models (such as probit), inference on the point estimate often suffices. If not, you can also let ritest call a wrapper function that executes additional commands, like margins:
program margin_post_wrapper
syntax , command(string)
`command'
margins, dydx(_all) post
//if the command you want to use does not "post" the results, you can make your wrapper programm "eclass" and use "estadd" to post something yourself.
end
ritest treatment (_b[treatment]/_se[treatment]): margin_post_wrapper, command(logit y x treatment)
Of course, you can still beef this up, by passing other arguments to the wrapper to make it more general. Whether or not this makes sense, depends entirely on your context. This may, for example make sense if you are not interested in plain coefficient estimate, but an interaction.
<a name="did"></a>Can you give a simple example using ritest with a difference-in-differences regression?
Setup: binary treatment and panel data.
This won't work:
gen treatpost = treatment*post
ritest treatment _b[treatpost]: reg y treatment post treatpost
for two reasons:
ritestwill keep permutingtreatment, but is not aware thattreatpostwill need to be updated as well.ritestdoes not know thattreatmenthas to be held constant within units.
instead run:
ritest treatment _b[c.treatment#c.post], cluster(unit_id): reg y c.treatment##c.post
<a name="3arms"></a>I have 3 treatment arms. How do I use ritest?
Setup: There are three treatment arms (0=Control, 1=Treatment A, 2=Treatment B).
What do yo want to test? David McKenzie has a short discussion of this here. In short, there are three main hypotheses one might want to test. (a) Treatment A is no different from Control, (b) Treatment B is no different from Controls and (c) the two treatments are indistinguishable.
Here give an example for (a). (b) and (c) are conducted analogously. First, make sure that you define a single treatment varaible that encodes all three cases as above. Then you can either run:
ritest treatment _b[1.treatment], .... : reg y i.treatment if treatment != 2
or
ritest treatment _b[1.treatment], fixlevels(2) .... : reg y i.treatment
(for this you'll need the latest version of ritest)
The two variants are slightly different. The first one drops observations of Treatment B from the estimation, assuming they are useless for identifying differences between Treatment A and control. The second one keeps these observations in the estimation sample, but excludes them from the re-randomization. Keeping them in the estimation and re-randomization would make no sense, as this would pool Treatment B and the control group and thus test a weird hypothesis.
The two variants could lead to different results if the observations of group B affect the estimation of _b[1.treatment]. Sometimes this can be good; for example, if your regression includes control variables and their coefficients become more precisely estimated when the full sample is used. In turn, the more precisely estimated control variable coefficient improves the estimate _b[1.treatment], which could be advantageous.
<a name="output"></a>How to I read the output?
I will use this example output to explain all elements:
0: ritest treatment _b[treatment]/_se[treatment], reps(500) strata(block): areg outcome treatment, r abs(block)
1: command: areg outcome treatment, r abs(block)
2: _pm_1: _b[treatment]/_se[treatment]
3: res. var(s): treatment
4: Resampling: Permuting treatment
5:Clust. var(s): __000000
6: Clusters: 99
7:Strata var(s): block
8: Strata: 4
9:
10:------------------------------------------------------------------------------
11:T | T(obs) c n p=c/n SE(p) [95% Conf. Interval]
12:-------------+----------------------------------------------------------------
13: _pm_1 | 2.362446 14 500 0.0280 0.0074 .0153906 .0465333
14:------------------------------------------------------------------------------
15:Note: Confidence interval is with respect to p=c/n.
16:Note: c = #{|T| >= |T(obs)|}
- The full command that is re-estimated at every iteration
- The statistic that is evaluated after each run of the command
- The variable that is be permuted/re-sampled
- A string indicating how the variable is permuted/re-sampled
- The variable that identifies treatment clusters. If none are given Stata will show the name of a tempvar here (
__000000) - The number of different clusters.
- The variable that identifies treatment strata.
- The number of strata
- The main table
T(obs)The realization of the test statistic in the datacthe count of under how many of the re-sampled assignments, the realization of the test-statistic was more extreme thanT(obs)nthe overall count of re-samplingsp=c/nthe
Related Skills
node-connect
340.2kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
84.1kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
340.2kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
commit-push-pr
84.1kCommit, push, and open a PR
