Risk Aversion of Online Learning Algorithms
Andreas Haupt · Aroon Narayanan
Abstract
We study a novel bias in online decision-making: Emergent risk aversion. When presented with actions of the same expectation, $\varepsilon$-Greedy chooses the lower-variance action with probability approaching one. Upper Confidence Band avoids this by debiasing their estimates of arm rewards. Risk aversion shapes arm choices in finite time, as we show in experiments.
Video
Chat is not available.
Successful Page Load