Skip to yearly menu bar Skip to main content

Workshop: Workshop on Distribution Shifts: New Frontiers with Foundation Models

Domain constraints improve risk prediction when outcome data is missing

Sidhika Balachandar · Nikhil Garg · Emma Pierson

Keywords: [ biomedicine ] [ Bayesian model ] [ domain constraint ] [ Health ] [ selective labels ] [ Distribution Shift ]

[ ] [ Project Page ]
Fri 15 Dec 11:45 a.m. PST — 11:55 a.m. PST


Machine learning models often predict the outcome resulting from a human decision. For example, if a doctor tests a patient for disease, will the patient test positive? A challenge is that the human decision censors the outcome data: we only observe test outcomes for patients doctors historically tested. Untested patients, for whom outcomes are unobserved, may differ from tested patients along observed and unobserved dimensions. We propose a Bayesian model to capture this setting whose purpose is to estimate risk for both tested and untested patients. To aid model estimation, we propose two domain-specific constraints which are plausible in health settings: a prevalence constraint, where the overall disease prevalence is known, and an expertise constraint, where the human decision-maker deviates from purely risk-based decision-making only along a constrained feature set. We show theoretically and on synthetic data that the constraints can improve parameter inference. We apply our model to a case study of cancer risk prediction, showing that the model can identify suboptimalities in test allocation and that the prevalence constraint increases the plausibility of inferences.

Chat is not available.