Poster
in
Workshop: CogInterp: Interpreting Cognition in Deep Learning Models

Disentangling Interpretable Cognitive Variables That Support Human Generalization

Xinyue Zhu · Daniel Kimmel

Project Page [ OpenReview]

Abstract

Abstraction and generalization are central to human intelligence. While many algorithms explain how generalization occurs once abstract knowledge is acquired, the mechanisms by which abstract variables are learned remain unclear. One approach is to interrogate computational models that reproduce human behavior. Handcrafted cognitive models are interpretable but rely on strong assumptions about predefined variables. In contrast, recurrent neural networks (RNNs) make fewer assumptions and capture behavior more accurately, yet yield high-dimensional representations of limited interpretability. Here, we use a Disentangled RNN (DisRNN) that uses information bottlenecks to learn a compact set of independent, interpretable latents. Previously, the DisRNN recovered expected mechanisms from simple behaviors. We extend the model to uncover novel mechanisms in a complex task with hidden structure across multiple timescales. The DisRNN was first trained on synthetic data from a handcrafted successor representation (SR) model fit to human behavior, then fine-tuned on data from 41 participants performing the task during fMRI. The model reproduced human learning dynamics across levels of abstraction, including generalizing the task schema to new task instances. Interrogating the model latents revealed a small set of disentangled variables that aligned with the task’s abstract structure, providing trial-by-trial estimates of cognitive variables to be tested in neural activity. This framework offers a mechanistic, interpretable account of how humans learn and generalize abstract structure, linking behavioral algorithm to potential neural implementation.

Chat is not available.