Workshop: Machine Learning Meets Econometrics (MLECON)

Inference of Heterogeneous Treatment Effects Using Observational Data with High-Dimensional Covariates

Jing Tao


The present work focuses on heterogeneous treatment effects using observational data with high-dimensional covariates and endogeneity. Novel estimation and inference methods are developed for treatment-covariate interaction effects and covariate-specific treatment effects with the help of an instrumental variable to deal with the endogeneity. The covariate-specific treatment effects represent the expected difference between potential outcomes given a set of covariates. The instrument induces exogeneity between the treatment and the potential outcomes given the covariates under the ``complier'' subgroup of the population. Under the framework of generalized linear models (GLMs), this study proposes regularized estimation for each regression coefficient under a non-convex objective function. Based on the initial regularized estimator, a debiased estimator is proposed for the regression coefficients, which eliminates the impact of regularization bias from both first- and second-stage regressions. The asymptotic normality results are provided for both the debiased estimator and its functional. Based on these results, confidence intervals could be constructed for the treatment, the covariates of interest, their interaction effects and the covariate-specific treatment effects. The proposed method can be applied to both continuous and categorical responses, corresponding to linear and non-linear second-stage regression models, respectively. The main contributions of this work are as follows. (i) A regularized two-stage estimation procedure is proposed for models on the compliers under data endogeneity. (ii) A novel approach to simultaneously correct the biases due to regularized estimation at both stages is proposed. (iii) A novel statistical inference procedure based on the de-biased estimator is developed for covariate effects and (local) heterogeneous treatment effects with high-dimensional data.