NeurIPS Domain constraints improve risk prediction when outcome data is missing

Oral
in
Workshop: Workshop on Distribution Shifts: New Frontiers with Foundation Models

Domain constraints improve risk prediction when outcome data is missing

Sidhika Balachandar · Nikhil Garg · Emma Pierson

Keywords: [ Distribution Shift ] [ selective labels ] [ Health ] [ domain constraint ] [ Bayesian model ] [ biomedicine ]

[ Abstract ] [ Project Page ]

[ OpenReview]

Abstract:

Machine learning models often predict the outcome resulting from a human decision. For example, if a doctor tests a patient for disease, will the patient test positive? A challenge is that the human decision censors the outcome data: we only observe test outcomes for patients doctors historically tested. Untested patients, for whom outcomes are unobserved, may differ from tested patients along observed and unobserved dimensions. We propose a Bayesian model to capture this setting whose purpose is to estimate risk for both tested and untested patients. To aid model estimation, we propose two domain-specific constraints which are plausible in health settings: a prevalence constraint, where the overall disease prevalence is known, and an expertise constraint, where the human decision-maker deviates from purely risk-based decision-making only along a constrained feature set. We show theoretically and on synthetic data that the constraints can improve parameter inference. We apply our model to a case study of cancer risk prediction, showing that the model can identify suboptimalities in test allocation and that the prevalence constraint increases the plausibility of inferences.

Chat is not available.

Oral in Workshop: Workshop on Distribution Shifts: New Frontiers with Foundation Models

Domain constraints improve risk prediction when outcome data is missing

Sidhika Balachandar · Nikhil Garg · Emma Pierson

Oral
in
Workshop: Workshop on Distribution Shifts: New Frontiers with Foundation Models