The Cue or not the Cue? A Mechanistic Study of Memory Mechanisms in RNNs
Abstract
Neural networks can solve behavioral tasks requiring memory either by remembering the full content or through active manipulation that retains a simplified version. Yet, distinguishing between these two memory retention mechanisms in recurrent neural networks (RNN) remains underexplored. To bridge this gap, we studied RNNs performing delayed cue discrimination (DCD) tasks and asked whether they retain raw continuous-valued input cues or their task-relevant binary representations. Using linear probes trained on neural activities during the delay period, we tested whether RNNs eventually collapse the retained cue values into compact, binary representations. Even though RNNs were trained only using binary cues, we consistently observed high reconstruction fidelity of continuous cue inputs across diverse experimental conditions and learned memory mechanisms. Overall, our results provide evidence that RNNs can find solutions preserving the contents of past memories with high fidelity, favoring representational completeness over efficiency, even when not demanded by the task.