Oral Poster
Questioning the Survey Responses of Large Language Models
Ricardo Dominguez-Olmedo · Moritz Hardt · Celestine Mendler-Dünner
East Exhibit Hall A-C #2708
Thu 12 Dec 10 a.m. PST — 11 a.m. PST
Surveys have recently gained popularity as a tool to study large language models. By comparing models’ survey responses to those of different human reference populations, researchers aim to infer the demographics, political opinions, or values best represented by current language models. In this work, we critically examine language models' survey responses on the basis of the well-established American Community Survey by the U.S. Census Bureau. Evaluating 42 different language models using de-facto standard prompting methodologies, we establish two dominant patterns. First, models' responses are governed by ordering and labeling biases, for example, towards survey responses labeled with the letter “A”. Second, when adjusting for these systematic biases through randomized answer ordering, models across the board trend towards uniformly random survey responses, irrespective of model size or training data. As a result, models consistently appear to better represent subgroups whose aggregate statistics are closest to uniform for the survey under consideration, leading to potentially misguided conclusions about model alignment.
Live content is unavailable. Log in and register to view live content