Comparing Human & Machine Communication Patterns through a Tangram Game
Abstract
When humans communicate about visual objects, they develop shared linguistic conventions that progressively reduce referential ambiguity through collaborative dialogue. To better understand the representational patterns underlying human communication and test whether vision-capable large language models (VLLMs) exhibit similar communicative behaviors, we compare human-human and agent-agent interactions in the tangram communication game. In this task, two players establish shared references for abstract shapes through dialogue across six repeated rounds. We analyzed existing human-human data and conducted agent-agent experiments with five VLLMs, measuring performance and using representational probes to explore the potential structure underlying performance. Humans demonstrate clear convention formation, with representations becoming increasingly distinguishable across rounds as task accuracy improves from 78\% to 96\%. In contrast, AI agents fail to exhibit similar collaborative patterns, achieving consistently low performance (10-30\%) with minimal improvement and no evidence of convention development, despite access to interim accuracy reports, full conversation history, and (most curiously) what appear to be largely accurate initial descriptions by `director' agents. Taken together, these preliminary results suggest VLLMs may still struggle with the kinds of grounded, evolving, coreferential structures that define human language in communicative contexts.