Poster
in
Workshop: AI meets Moral Philosophy and Moral Psychology: An Interdisciplinary Dialogue about Computational Ethics

#17: Value as Semantic Embedding: Disentangling Moral and Hedonic Dimensions

Anna Leshinskaya · Alek Chakroff

Keywords: AI alignment; morality; value; large language models

2023 Poster
in
Workshop: AI meets Moral Philosophy and Moral Psychology: An Interdisciplinary Dialogue about Computational Ethics

Project Page [ OpenReview]

Abstract

Aligning AI with human objectives can be facilitated by enabling it to learn and represent our values. In modern AI agents, value is construed as a scalar magnitude reflecting the desirability of a given state or action. We propose a framework, value-as-semantics, in which these magnitudes are represented within a large-scale, high-dimensional semantic embedding (here, openAI's GPT-3.5). This allows value to be quantitative, yet assigned to any expression in natural language while inheriting the expressivity and generalizability of a semantic representation. We evaluate the key assumption that value can be extracted distinctly and selectively from other semantic attributes and that we can also distinguish distinct kinds of value. Building on prior work on moral value extraction, we test the extent to which LLM embeddings can distinctly encode both moral and selfish (hedonic) values. We confirmed that moral and hedonic value were both separable from a control semantic attribute. However, moral and hedonic values were themselves deeply entangled, leading to high moral values for selfish acts like “winning the lottery” and low moral values for accidental self-harms, like "losing my wallet". These findings suggest that a value function is possible to emulate with an LLM, but that distinguishing among kinds of value remains an important engineering need. This must be resolved before LLMs can produce reasonable moral judgments. Nonetheless, we argue that building a value-as-semantics architecture can be an important contribution towards a full computational model of human-like action planning and moral reasoning.

Chat is not available.