Poster
in
Workshop: Algorithmic Fairness through the Lens of Time

On Mitigating Unconscious Bias through Bandits with Evolving Biased Feedback

Matthew Faw · Constantine Caramanis · Sanjay Shakkottai · Jessica Hoffmann

[ Poster]

Abstract

Media stereotypes, cultural stereotypes, and affinity bias are some of the driving factors shaping our unconscious biases. As the demographic landscape of the workforce evolves, this bias is subject to change, and in particular could be erased or inverted (e.g. computer programming was considered a ``woman's job'' in the US in the 1940s). To study this feedback loop between workforce demographics and bias, we introduce a multi-armed bandit model for which we only perceive a time-dependent biased reward, which is a function of the (evolving) fraction of times we picked each arm. We show that if we ignore the bias, UCB incurs linear regret in this setting. By contrast, when the bias model is exactly known, then an elimination-style algorithm achieves a regret at most $K^2$ times larger than in the standard, unbiased bandit setting. Moreover, we show that this regret scaling is (essentially) unimprovable by deriving a new instance-dependent regret lower bound which is roughly $K^2$ times larger than in the standard bandit setting, even in the setting where the policy knows the bias model exactly. To obtain this lower bound when the observed reward distributions are (i) time-varying and (ii) dependent on the policy's past actions, we develop new proof techniques beyond the standard bandit lower bound arguments, which may be of independent interest. In particular, we identify a ``bottleneck'' set of actions for which any policy must either (a) play many times, or (b) observe significantly biased samples. Then, using a stopped version of the divergence decomposition, we carefully construct a stopping time which allows us to translate cases (a) and (b) into an amplified lower bound.

Chat is not available.