Invited Talk #6: Features or Bugs: Synergistic Idiosyncrasies in Human Learning and Decision-Making
in
Workshop: Biological and Artificial Reinforcement Learning
Abstract
Combining a multi-armed bandit task and Bayesian computational modeling, we find that humans systematically under-estimate reward availability in the environment. This apparent pessimism turns out to be an optimism bias in disguise, and one that compensates for other idiosyncrasies in human learning and decision-making under uncertainty, such as a default tendency to assume non-stationarity in environmental statistics as well as the adoption of a simplistic decision policy. In particular, reward rate underestimation discourages the decision-maker from switching away from a “good” option, thus achieving near-optimal behavior (which never switches away after a win). Furthermore, we demonstrate that the Bayesian model that best predicts human behavior is equivalent to a particular form of Q-learning often used in the brain sciences, thus providing statistical, normative grounding to phenomenological models of human and animal behavior.