CfcausalPaper
Paper Repository
Install / Use
/learn @lihualei71/CfcausalPaperREADME
Paper Repository
This repository contains the code to implement all examples in our paper: Conformal Inference of Counterfactuals and Individual Treatment Effects.
Introduction
All R scripts are included in the folder code/. The bash files to submit jobs to the cluster are included in the folder jobs/ (note that this depends on your cluster and the bash file might need to be changed accordingly). The outputs and the plots are included in the folder data/ and the folder figs/, respectively.
Installing the package
The cfcausal package needs to be installed.
if (!require("devtools")){
install.packages("devtools")
}
devtools::install_github("lihualei71/cfcausal")
The following packages are required to be installed as well: grf, randomForest, gbm, bartMachine, causalToolbox, tidyverse, ggplot2 and argparse.
R scripts
The folder code/ contains all R scripts:
simul_synthetic.Ris an executable R script that produces the result of a single run of the numerical study in Section 3.6 with four scenarios: homoscedastic/heteroscedastic errors + independent/correlated covariates. It takes four inputs:--nfor the sample size,--dfor the dimension,--Bfor the number of bootstrap draws in X-learner and--seedfor the random seed. The outputresis a list of length four with each corresponding to a scenario. Eachres[[i]]is a list of three withres[[i]]$taubeing the results for CATE,res[[i]]$Ybeing the results for ITE andres[[i]]$condbeing the conditional coverage for ITE. The objectreswill be stored indata/with filename "synthetic_simul_n${n}_d${d}_seed${seed}.RData". It can be implemented both interactively in an R console by setting the parameters in line 23-26, or noninteractively in a shell by running the script with aforementioned inputs. Below is a toy example that runs for a few minutes on a laptop.
Rscript simul_synthetic.R --n 500 --d 2 --B 0 --seed 199
utils_synthetic_expr.Rimplements helpers forsimul_synthetic.R.simul_NLSM.Ris an executable R script that produces the result of a single run of the numerical study in Section 4.4. It takes five inputs:--Bfor the number of bootstrap draws in X-learner,--alphafor the level,--seedfor the random seed,--ntrainfor the size of training set and--ntestfor the size of testing set. The outputresis a list of length two withres$marginalbeing the results for unconditional coverage andres$condbeing the results for conditional coverage. The objectreswill be stored indata/with filename "NLSM_simul_alpha${alpha}_seed${seed}_ntr${ntr}_nte${nte}_B${B}.RData".It can be implemented both interactively in an R console by setting the parameters in line 25-29, or noninteractively in a shell by running the script with aforementioned inputs. Below is a toy example that runs for a few minutes on a laptop.
Rscript simul_NLSM.R --ntr 1000 --nte 5000 --B 0 --alpha 0.05 --seed 199
simul_prep_NLSM.Rgenerates the synthetic data to be used insimul_NLSM.Rbased on the NLSM data from ACIC 2018 workshop. The raw data is stored indata/NLSM_data.csvand the generated data is stored indata/NLSM_simul.RDatautils_real_expr.Rimplements helpers forsimul_NLSM.R.analysis_NLSM.Ris an executable R script that produces the result of a single run of the numerical study in Section 4.5. It takes one input--seedfor the random seed. The outputresis a data.frame. The objectreswill be stored indata/with filename "analysis_NLSM_seed${seed}.RData".It can be implemented both interactively in an R console by setting the parameters in line 18, or noninteractively in a shell by running the script with aforementioned inputs. Below is a toy example that runs for a few minutes on a laptop.
Rscript analysis_NLSM.R --seed 199
-
postprocess_synthetic_simul.R,postprocess_NLSM_simul.Randpostprocess_NLSM_analysis.Rpostprocess the results obtained from cluster jobs. See the last section for details. -
plot_synthetic_simul.R,plot_NLSM_simul.Randplot_NLSM_analysis.Rgenerate the plots in the paper. See the last section for details.
Submitting jobs, Postprocessing results, and generating plots
The folder jobs/ contains all bash scripts to submit jobs to the cluster. The numerical studies take ~1200 CPU hours in total. To run each .sh file, create the following folders first.
mkdir log results raw_data_cluster
log/ stores the system reports, results/ stores the R stdouts, and raw_data_cluster stores the output/results of each job.
Upon finishing all jobs, run postprocess_synthetic_simul.R, postprocess_NLSM_simul.R and postprocess_NLSM_analysis.R. They generate simul_synthetic_results.RData, simul_NLSM_results.RData and analysis_NLSM_results.RData respectively in the folder data/, which merge the results from different cores.
Finally, run plot_synthetic_simul.R, plot_NLSM_simul.R and plot_NLSM_analysis.R to generate the figures in the paper.
