Sendhil Mullainathan: Misuses of Machine Learning in Health Policy
in
Workshop: Machine Learning for Health
Abstract
We highlight some common (and costly) reasons for misuse of machine learning in health, illustrated using the potential outcomes framework from econometric work on causal inference. First, the failure to specify the decision which will be influenced by the prediction: the same prediction can lead to valid inferences for certain decisions but highly suspect ones for other decisions. Second, the selective labels problem: the data used to form the prediction is endogenously generated. Third, the conflation of averages with margins. We illustrate these points with two predictors that are commonly misused: readmissions and mortality. We argue that on the one hand, ignoring these problems can lead to highly misleading applications; on the other hand, judicious choice of applications and methods can allow one to circumvent these problems.