Conflict Adaptation in Vision-Language Models
Abstract
A signature of human cognitive control is conflict adaptation: improved performance on a high-conflict trial following another high-conflict trial. This phenomenon offers an account for how cognitive control, a scarce resource, is recruited. Using a sequential Stroop task, we find that 12 of 13 vision-language models (VLMs) tested exhibit behavior consistent with conflict adaptation; the sole exception is the highest-performing model, possibly due to a ceiling effect. To understand the representational basis of this behavior, we use sparse autoencoders (SAEs) to identify task-relevant “supernodes” in InternVL 3.5 4B. Early- and late-layer supernodes emerge for both text and color with partial overlap, and their relative sizes mirror the automaticity asymmetry between reading and color naming in humans. We further isolate a supernode in layers 24-25 whose activation is conflict-dependent and causally necessary for conflict resolution, as evidenced by 3-8 fold Stroop error increases upon ablation.