Kindness or Sycophancy? Understanding and Shaping Model Personality via Synthetic Games
Abstract
The conversational style of Large Language Models (LLMs) systematically influences user judgment and decision-making. However, robustly defining and quantifying the impact of specific persona traits, such as empathy or helpfulness, is an open challenge. We propose a control-theoretic framework, grounded in synthetic-game scenarios, formalizing user-LLM interactions as sequential decision processes with explicit user objectives. Implementing scenarios such as bargaining games and cooperative games with additional symmetry constraints enables precise measurement of cognitive biases (deviations from optimal behavior) and feedback helpfulness (bias reduction). Experiments reveal that feedback helpfulness significantly depends on empathetic style: moderate empathy improves user decisions, whereas excessive empathy devolves into counterproductive sycophancy. Optimal empathy levels vary with the user's emotional and cognitive states. Our synthetic-game framework provides clarity and practical tools for adaptively shaping LLM conversational strategies toward safer, more aligned interactions.