Signatures of human-like processing in Transformer forward passes
Abstract
A dominant way of using AI models to study human cognition is to evaluate whether human-derived measures are predicted by a model's output: that is, the end-product of a forward pass. However, mechanistic interpretability has begun to reveal the models' internal processes, raising the question of whether models use human-like processing strategies. We investigate the relationship between real-time processing in humans and layer-time dynamics of computation in Transformers, testing 20 open-source models in 6 domains. We find that, in the cases where we would expect decision conflict in humans, models appear to initially favor a competing incorrect answer over the correct answer. We also find that dynamic measures improve prediction of human processing measures relative to static measures. Moreover, larger models do not always show more human-like processing patterns. Our work suggests a new way of using AI models as explicit models of human processing.