Skip to yearly menu bar Skip to main content

Workshop: Algorithmic Fairness through the Lens of Time

A Causal Perspective on Label Bias

Vishwali Mhasawade · Alexander D'Amour · Stephen Pfohl


A common setting for algorithmic decision making relies on the use of a prediction of a proxy label to decide on a specific course of action or to make a downstream decision, such as the enrollment of a patient in a care management program based on prediction of their expected healthcare expenditure. Proxy labels are used because the true label of interest may be difficult or impossible to measure in practice. However, the use of a proxy label may propagate equity-related harms when the relationship between the unmeasured true label and the proxy label differs across subgroups (e.g. by race or gender). In this work, we propose a causal approach for identifying label bias due to a biased proxy, by illustrating the specific causal conditions under which the proxy label systematically differs from the true label across subgroups. In such scenarios using a biased proxy for downstream decision making can prove to be harmful for certain subgroups and worsen inequity over the course of time. Here, we specifically study label bias for evaluation of models with respect to sufficiency, a fairness metric, and outline the scenarios under which a Bayes-optimal model trained on a biased proxy can result in unfair characteristics for the true label. We demonstrate the implications of a biased proxy for a synthetic health insurance dataset used in the context of a care management system. Our results verify that evaluation with a biased proxy masks fairness violations with respect to the true label.

Chat is not available.