`

Timezone: »

 
Modern Hopfield Networks for Return Decomposition for Delayed Rewards
Michael Widrich · Markus Hofmarcher · Vihang Patil · Angela Bitto · Sepp Hochreiter
Event URL: https://openreview.net/forum?id=t0PQSDcqAiy »

Delayed rewards, which are separated from their causative actions by irrelevant actions, hamper learning in reinforcement learning (RL). Especially real world problems often contain such delayed and sparse rewards. Recently, return decomposition for delayed rewards (RUDDER) employed pattern recognition to remove or reduce delay in rewards, which dramatically simplifies the learning task of the underlying RL method. RUDDER was realized using a long short-term memory (LSTM). The LSTM was trained to identify important state-action pair patterns, responsible for the return. Reward was then redistributed to these important state-action pairs. However, training the LSTM is often difficult and requires a large number of episodes. In this work, we replace the LSTM with the recently proposed continuous modern Hopfield networks (MHN) and introduce Hopfield-RUDDER. MHN are powerful trainable associative memories with large storage capacity. They require only few training samples and excel at identifying and recognizing patterns. We use this property of MHN to identify important state-action pairs that are associated with low or high return episodes and directly redistribute reward to them. However, in partially observable environments, Hopfield-RUDDER requires additional information about the history of state-action pairs. Therefore, we evaluate several methods for compressing history and introduce reset-max history, a lightweight history compression using the max-operator in combination with a reset gate. We experimentally show that Hopfield-RUDDER is able to outperform LSTM-based RUDDER on various 1D environments with small numbers of episodes. Finally, we show in preliminary experiments that Hopfield-RUDDER scales to highly complex environments with the Minecraft ObtainDiamond task from the MineRL NeurIPS challenge.

Author Information

Michael Widrich (Ellis Unit / University Linz)
Markus Hofmarcher (ELLIS Unit / University Linz)
Vihang Patil (LIT AI Lab, Institute for Machine Learning, Johannes Kepler University Linz, Austria)
Angela Bitto (JKU)
Sepp Hochreiter (LIT AI Lab / University Linz)

Head of the LIT AI Lab and Professor of bioinformatics at the University of Linz. First to identify and analyze the vanishing gradient problem, the fundamental deep learning problem, in 1991. First author of the main paper on the now widely used LSTM RNNs. He implemented 'learning how to learn' (meta-learning) networks via LSTM RNNs and applied Deep Learning and RNNs to self-driving cars, sentiment analysis, reinforcement learning, bioinformatics, and medicine.

More from the Same Authors