NeurIPS #07: Doing the right thing for the right reason: Evaluating artificial moral cognition by probing cost insensitivity

Poster
in
Workshop: AI meets Moral Philosophy and Moral Psychology: An Interdisciplinary Dialogue about Computational Ethics

#07: Doing the right thing for the right reason: Evaluating artificial moral cognition by probing cost insensitivity

Yiran Mao · Madeline G. Reinecke · Markus Kunesch · Edgar Duéñez-Guzmán · Ramona Comanescu · Julia Haas · Joel Leibo

Keywords: [ multi-agent reinforcement learning ] [ artificial intelligence ] [ moral cognition ]

[ Abstract ] [ Project Page ]

[ OpenReview]

Abstract:

Is it possible to evaluate the moral cognition of artificial agents? In this work, we take inspiration from developmental and comparative psychology and develop a behavior-based analysis to evaluate one aspect of moral cognition---when an agent 'does the right thing for the right reasons.' We argue that, regardless of the nature of agent, morally-motivated behavior should persist despite mounting cost; by measuring an agent's sensitivity to this cost, we gain deeper insight into their underlying motivations. We apply this evaluation scheme to a particular set of deep reinforcement learning agents that can adapt to changes in cost. Our results shows that agents trained with a reward function including other-regarding preferences perform helping behavior in a way that is less sensitive to increasing cost than agents trained with more self-interested preferences. This project showcases how psychology can benefit the creation and evaluation of artificial moral cognition.

Chat is not available.

Poster in Workshop: AI meets Moral Philosophy and Moral Psychology: An Interdisciplinary Dialogue about Computational Ethics

#07: Doing the right thing for the right reason: Evaluating artificial moral cognition by probing cost insensitivity

Yiran Mao · Madeline G. Reinecke · Markus Kunesch · Edgar Duéñez-Guzmán · Ramona Comanescu · Julia Haas · Joel Leibo

Poster
in
Workshop: AI meets Moral Philosophy and Moral Psychology: An Interdisciplinary Dialogue about Computational Ethics