Poster
in
Workshop: Workshop on Multi-Turn Interactions in Large Language Models

Do Large Language Models Defend Their Beliefs Consistently?

Arka Pal ⋅ Arthur Liang ⋅ Teo Kitanovski ⋅ Akilesh Potti ⋅ Micah Goldblum

2025 Poster
in
Workshop: Workshop on Multi-Turn Interactions in Large Language Models

Project Page [ OpenReview]

Abstract

When large language models (LLMs) are challenged on their response, they may defer to the user or uphold their response. Some models may be more deferent, while others may be more stubborn in defense of their beliefs. The 'appropriate' level of belief defense may be conditional on the task and user preferences, but it is nonetheless desirable that the model behave consistently in this respect. In particular, on average, when a model has a high confidence in its answer, it should not defer more often than when it has a lower confidence; and this should be independent of the model's overall tendency towards deference. We term acting in this manner as being belief-consistent, and we carry out the first detailed study of belief-consistency in modern LLMs.

Chat is not available.