`

Timezone: »

 
Modern Hopfield Networks for Return Decomposition for Delayed Rewards
Michael Widrich · Markus Hofmarcher · Vihang Patil · Angela Bitto · Sepp Hochreiter

Delayed rewards, which are separated from their causative actions by irrelevant actions, hamper learning in reinforcement learning (RL). Especially real world problems often contain such delayed and sparse rewards. Recently, return decomposition for delayed rewards (RUDDER) employed pattern recognition to remove or reduce delay in rewards, which dramatically simplifies the learning task of the underlying RL method. RUDDER was realized using a long short-term memory (LSTM). The LSTM was trained to identify important state-action pair patterns, responsible for the return. Reward was then redistributed to these important state-action pairs. However, training the LSTM is often difficult and requires a large number of episodes. In this work, we replace the LSTM with the recently proposed continuous modern Hopfield networks (MHN) and introduce Hopfield-RUDDER. MHN are powerful trainable associative memories with large storage capacity. They require only few training samples and excel at identifying and recognizing patterns. We use this property of MHN to identify important state-action pairs that are associated with low or high return episodes and directly redistribute reward to them. However, in partially observable environments, Hopfield-RUDDER requires additional information about the history of state-action pairs. Therefore, we evaluate several methods for compressing history and introduce reset-max history, a lightweight history compression using the max-operator in combination with a reset gate. We experimentally show that Hopfield-RUDDER is able to outperform LSTM-based RUDDER on various 1D environments with small numbers of episodes. Finally, we show in preliminary experiments that Hopfield-RUDDER scales to highly complex environments with the Minecraft ObtainDiamond task from the MineRL NeurIPS challenge.

Author Information

Michael Widrich (Ellis Unit / University Linz)
Markus Hofmarcher (ELLIS Unit / University Linz)
Vihang Patil (LIT AI Lab, Institute for Machine Learning, Johannes Kepler University Linz, Austria)
Angela Bitto (JKU)
Sepp Hochreiter (LIT AI Lab / University Linz / IARAI)

More from the Same Authors

  • 2021 : Assigning Credit to Human Decisions using Modern Hopfield Networks »
    Michael Widrich · Markus Hofmarcher · Vihang Patil · Angela Bitto · Sepp Hochreiter
  • 2021 : Modern Hopfield Networks for Return Decomposition for Delayed Rewards »
    Michael Widrich · Markus Hofmarcher · Vihang Patil · Angela Bitto · Sepp Hochreiter
  • 2021 : Understanding the Effects of Dataset Composition on Offline Reinforcement Learning »
    Kajetan Schweighofer · Markus Hofmarcher · Marius-Constantin Dinu · Philipp Renz · Angela Bitto · Vihang Patil · Sepp Hochreiter
  • 2021 : Understanding the Effects of Dataset Characteristics on Offline Reinforcement Learning »
    Kajetan Schweighofer · Markus Hofmarcher · Marius-Constantin Dinu · Philipp Renz · Angela Bitto · Vihang Patil · Sepp Hochreiter
  • 2021 : Understanding the Effects of Dataset Composition on Offline Reinforcement Learning »
    Kajetan Schweighofer · Markus Hofmarcher · Marius-Constantin Dinu · Angela Bitto · Philipp Renz · Vihang Patil · Sepp Hochreiter
  • 2021 : Understanding the Effects of Dataset Composition on Offline Reinforcement Learning »
    Kajetan Schweighofer · Markus Hofmarcher · Marius-Constantin Dinu · Angela Bitto · Philipp Renz · Vihang Patil · Sepp Hochreiter
  • 2021 : Traffic4cast 2021 – Temporal and Spatial Few-Shot Transfer Learning in Traffic Map Movie Forecasting + Q&A »
    Moritz Neun · Christian Eichenberger · Henry Martin · Pedro Herruzo · David Jonietz · Fei Tang · Daniel Springer · Markus Spanring · Avi Avidan · Luis Ferro · Ali Soleymani · Rohit Gupta · Bo Xu · Kevin Malm · Aleksandra Gruca · Johannes Brandstetter · Michael Kopp · David Kreil · Sepp Hochreiter
  • 2020 : Traffic Map Movies - An Introduction to the Traffic4cast Challenge »
    Sepp Hochreiter
  • 2020 Poster: Modern Hopfield Networks and Attention for Immune Repertoire Classification »
    Michael Widrich · Bernhard Schäfl · Milena Pavlović · Hubert Ramsauer · Lukas Gruber · Markus Holzleitner · Johannes Brandstetter · Geir Kjetil Sandve · Victor Greiff · Sepp Hochreiter · Günter Klambauer
  • 2020 Spotlight: Modern Hopfield Networks and Attention for Immune Repertoire Classification »
    Michael Widrich · Bernhard Schäfl · Milena Pavlović · Hubert Ramsauer · Lukas Gruber · Markus Holzleitner · Johannes Brandstetter · Geir Kjetil Sandve · Victor Greiff · Sepp Hochreiter · Günter Klambauer
  • 2020 : Modern Hopfield Networks and Attention for Immune Repertoire Classification »
    Michael Widrich
  • 2019 : Poster and Coffee Break 2 »
    Karol Hausman · Kefan Dong · Ken Goldberg · Lihong Li · Lin Yang · Lingxiao Wang · Lior Shani · Liwei Wang · Loren Amdahl-Culleton · Lucas Cassano · Marc Dymetman · Marc Bellemare · Marcin Tomczak · Margarita Castro · Marius Kloft · Marius-Constantin Dinu · Markus Holzleitner · Martha White · Mengdi Wang · Michael Jordan · Mihailo Jovanovic · Ming Yu · Minshuo Chen · Moonkyung Ryu · Muhammad Zaheer · Naman Agarwal · Nan Jiang · Niao He · Nikolaus Yasui · Nikos Karampatziakis · Nino Vieillard · Ofir Nachum · Olivier Pietquin · Ozan Sener · Pan Xu · Parameswaran Kamalaruban · Paul Mineiro · Paul Rolland · Philip Amortila · Pierre-Luc Bacon · Prakash Panangaden · Qi Cai · Qiang Liu · Quanquan Gu · Raihan Seraj · Richard Sutton · Rick Valenzano · Robert Dadashi · Rodrigo Toro Icarte · Roshan Shariff · Roy Fox · Ruosong Wang · Saeed Ghadimi · Samuel Sokota · Sean Sinclair · Sepp Hochreiter · Sergey Levine · Sergio Valcarcel Macua · Sham Kakade · Shangtong Zhang · Sheila McIlraith · Shie Mannor · Shimon Whiteson · Shuai Li · Shuang Qiu · Wai Lok Li · Siddhartha Banerjee · Sitao Luan · Tamer Basar · Thinh Doan · Tianhe Yu · Tianyi Liu · Tom Zahavy · Toryn Klassen · Tuo Zhao · Vicenç Gómez · Vincent Liu · Volkan Cevher · Wesley Suttle · Xiao-Wen Chang · Xiaohan Wei · Xiaotong Liu · Xingguo Li · Xinyi Chen · Xingyou Song · Yao Liu · YiDing Jiang · Yihao Feng · Yilun Du · Yinlam Chow · Yinyu Ye · Yishay Mansour · · Yonathan Efroni · Yongxin Chen · Yuanhao Wang · Bo Dai · Chen-Yu Wei · Harsh Shrivastava · Hongyang Zhang · Qinqing Zheng · SIDDHARTHA SATPATHI · Xueqing Liu · Andreu Vall
  • 2019 : Traffic4cast -- Traffic Map Movie Forecasting »
    Sepp Hochreiter · Leonid Sigal · Moritz Neun · David Jonietz · Sungbin Choi · Henry Martin · Wei Yu · Zhichen Liu · Tu Nguyen · Pedro Herruzo Sánchez · Xiaoxia Shi · Aleksandra Gruca · Alastair Sutherland · David Kreil · Michael Kopp
  • 2019 : Poster Session »
    Jonathan Scarlett · Piotr Indyk · Ali Vakilian · Adrian Weller · Partha P Mitra · Benjamin Aubin · Bruno Loureiro · Florent Krzakala · Lenka Zdeborová · Kristina Monakhova · Joshua Yurtsever · Laura Waller · Hendrik Sommerhoff · Michael Moeller · Rushil Anirudh · Shuang Qiu · Xiaohan Wei · Zhuoran Yang · Jayaraman Thiagarajan · Salman Asif · Michael Gillhofer · Johannes Brandstetter · Sepp Hochreiter · Felix Petersen · Dhruv Patel · Assad Oberai · Akshay Kamath · Sushrut Karmalkar · Eric Price · Ali Ahmed · Zahra Kadkhodaie · Sreyas Mohan · Eero Simoncelli · Carlos Fernandez-Granda · Oscar Leong · Wesam Sakla · Rebecca Willett · Stephan Hoyer · Jascha Sohl-Dickstein · Samuel Greydanus · Gauri Jagatap · Chinmay Hegde · Michael Kellman · Jonathan Tamir · Nouamane Laanait · Ousmane Dia · Mirco Ravanelli · Jonathan Binas · Negar Rostamzadeh · Shirin Jalali · Tiantian Fang · Alex Schwing · Sébastien Lachapelle · Philippe Brouillard · Tristan Deleu · Simon Lacoste-Julien · Stella Yu · Arya Mazumdar · Ankit Singh Rawat · Yue Zhao · Jianshu Chen · Xiaoyang Li · Hubert Ramsauer · Gabrio Rizzuti · Nikolaos Mitsakos · Dingzhou Cao · Thomas Strohmer · Yang Li · Pei Peng · Gregory Ongie
  • 2019 Poster: RUDDER: Return Decomposition for Delayed Rewards »
    Jose A. Arjona-Medina · Michael Gillhofer · Michael Widrich · Thomas Unterthiner · Johannes Brandstetter · Sepp Hochreiter