Interpretable Hybrid Neural-Cognitive Models Discover Cognitive Strategies Underlying Flexible Reversal Learning
Abstract
Flexible learning in dynamic environments is classically studied with reversal learning tasks, but existing reinforcement learning models often fail to capture the full richness of behavior. Here we used HybridRNNs to model human and primate reversal learning. Among several variants, the Context-ANN, a model variant that replaced linear value belief updating rule with neural network and additional contextual information input, achieved the highest predictive accuracy, closely matching trial-by-trial adaptation to reversals. Analyses of its internal dynamics revealed a distinctive, context-dependent value-updating strategy with non-linear attractor structures, providing interpretable insights into how flexible learning is implemented. These results show that HybridRNNs offer a powerful framework for modeling behavior that is both predictable and interpretable, bridging the gap between cognitive models and neural network approaches.