Skip to yearly menu bar Skip to main content

Workshop: Causal Representation Learning

Causal Regressions For Unstructured Data

Amandeep Singh · Bolong Zheng

Keywords: [ Generative AI ] [ Adversarial Generalized Method of Moments ] [ Riesz Representor ] [ Causal Learning ] [ Instrumental Variables ]


The focus of much recent research in economics and marketing has been (1) to allow for unstructured data in causal studies and (2) to flexibly address the issue of endogeneity withobservational data and perform valid causal inference. Directly using machine learning algorithms to predict the outcomevariable can help deal with the issue of unstructured data; however, it is well knownthat such an approach does not perform well in the presence of endogeneity in theexplanatory variables. On the other hand, extant methods catered towards addressing endogeneity issues make strong parametric assumptions and hence are incapable of“directly" incorporating high-dimensional unstructured data. In this paper, we propose an estimator,which we term “RieszIV" for carrying out estimation and inference with high-dimensional observational datawithout resorting to parametric approximations. We demonstrate our estimator exhibits asymptotic consistency and normality under a mild set of conditions. We carryout extensive Monte Carlo simulations with both low-dimensional and high-dimensionalunstructured data to demonstrate the finite sample performance of our estimator. Finally, using app downloads and review data for apps on Google Play we demonstrate how our method can be used to conduct inference over counterfactual policies over rich text data. We show how large language models can be used as a viable counterfactual policy generation operator. This represents an important advance in expanding counterfactual inference to complex, real-world settings.

Chat is not available.