Timezone: »
Poor sample efficiency plagues the practical applicability of deep reinforcement learning (RL) algorithms, especially compared to biological intelligence. In order to close the gap, previous work have proposed to augment the RL framework with an analogue of biological episodic memory, leading to the emerging field of ``episodic control". Episodic memory refers to the ability to recollect individual events independent of the slower process of learning accumulated statistics, and evidence suggests that humans can use episodic memory for planning. Existing attempts to integrate episodic memory components into RL agents have mostly focused on the model-free domain, leaving scope for investigating their roles under the model-based settings. Here we propose the Episodic Memory Module (EMM) to aid learning of world-model transitions, instead of value functions for standard Episodic-RL. The EMM stores latent state transitions that have high prediction-error under the model as memories, and uses linearly interpolated memories when the model shows high epistemic uncertainty. Memories are dynamically forgotten with a timescale reflecting their continuing surprise and uncertainty. Implemented in combination with existing world-model agents, the EMM produces a significant boost in performance over baseline agents on complex Atari games such as Montezuma's Revenge. Our results indicate that the EMM can temporarily fill in gaps while a world model is being learned, giving significant advantages in complex environments where such learning is slow.
Author Information
Julian Coda-Forno (Max Planck Institute for Biological Cybernetics)
Changmin Yu (UCL)
Qinghai Guo (Huawei)
Zafeirios Fountas (Huawei technologies)
Neil Burgess (University College London)
More from the Same Authors
-
2022 Poster: Differentiable hierarchical and surrogate gradient search for spiking neural networks »
Kaiwei Che · Luziwei Leng · Kaixuan Zhang · Jianguo Zhang · Qinghu Meng · Jie Cheng · Qinghai Guo · Jianxing Liao -
2022 : Modelling non-reinforced preferences using selective attention »
Noor Sajid · Panagiotis Tigas · Zafeirios Fountas · Qinghai Guo · Alexey Zakharov · Lancelot Da Costa -
2022 : Constructing Memory: Consolidation as Teacher-Student Training of a Generative Model »
Eleanor Spens · Neil Burgess -
2022 Spotlight: Lightning Talks 4A-3 »
Zhihan Gao · Yabin Wang · Xingyu Qu · Luziwei Leng · Mingqing Xiao · Bohan Wang · Yu Shen · Zhiwu Huang · Xingjian Shi · Qi Meng · Yupeng Lu · Diyang Li · Qingyan Meng · Kaiwei Che · Yang Li · Hao Wang · Huishuai Zhang · Zongpeng Zhang · Kaixuan Zhang · Xiaopeng Hong · Xiaohan Zhao · Di He · Jianguo Zhang · Yaofeng Tu · Bin Gu · Yi Zhu · Ruoyu Sun · Yuyang (Bernie) Wang · Zhouchen Lin · Qinghu Meng · Wei Chen · Wentao Zhang · Bin CUI · Jie Cheng · Zhi-Ming Ma · Mu Li · Qinghai Guo · Dit-Yan Yeung · Tie-Yan Liu · Jianxing Liao -
2022 Spotlight: Differentiable hierarchical and surrogate gradient search for spiking neural networks »
Luziwei Leng · Kaiwei Che · Kaixuan Zhang · Jianguo Zhang · Qinghu Meng · Jie Cheng · Qinghai Guo · Jianxing Liao -
2022 Poster: DevFly: Bio-Inspired Development of Binary Connections for Locality Preserving Sparse Codes »
Tianqi Wei · Rana Alkhoury Maroun · Qinghai Guo · Barbara Webb -
2022 Poster: Structured Recognition for Generative Models with Explaining Away »
Changmin Yu · Hugo Soulat · Neil Burgess · Maneesh Sahani -
2022 Poster: Self-Supervised Learning Through Efference Copies »
Franz Scherr · Qinghai Guo · Timoleon Moraitis -
2019 Poster: Coordinated hippocampal-entorhinal replay as structural inference »
Talfan Evans · Neil Burgess