Timezone: »
In an effort to better understand the different ways in which the discount factor affects the optimization process in reinforcement learning, we designed a set of experiments to study each effect in isolation. Our analysis reveals that the common perception that poor performance of low discount factors is caused by (too) small action-gaps requires revision. We propose an alternative hypothesis that identifies the size-difference of the action-gap across the state-space as the primary cause. We then introduce a new method that enables more homogeneous action-gaps by mapping value estimates to a logarithmic space. We prove convergence for this method under standard assumptions and demonstrate empirically that it indeed enables lower discount factors for approximate reinforcement-learning methods. This in turn allows tackling a class of reinforcement-learning problems that are challenging to solve with traditional methods.
Author Information
Harm Van Seijen (Microsoft Research)
Mehdi Fatemi (Microsoft Research)
Arash Tavakoli (Imperial College London)
Related Events (a corresponding poster, oral, or spotlight)
-
2019 Poster: Using a Logarithmic Mapping to Enable Lower Discount Factors in Reinforcement Learning »
Wed. Dec 11th 01:30 -- 03:30 AM Room East Exhibition Hall B + C #215
More from the Same Authors
-
2022 : Agent-Controller Representations: Principled Offline RL with Rich Exogenous Information »
Riashat Islam · Manan Tomar · Alex Lamb · Hongyu Zang · Yonathan Efroni · Dipendra Misra · Aniket Didolkar · Xin Li · Harm Van Seijen · Remi Tachet des Combes · John Langford -
2022 : Replay Buffer With Local Forgetting for Adaptive Deep Model-Based Reinforcement Learning »
Ali Rahimi-Kalahroudi · Janarthanan Rajendran · Ida Momennejad · Harm Van Seijen · Sarath Chandar -
2021 Poster: Medical Dead-ends and Learning to Identify High-Risk States and Treatments »
Mehdi Fatemi · Taylor Killian · Jayakumar Subramanian · Marzyeh Ghassemi -
2020 Poster: The LoCA Regret: A Consistent Metric to Evaluate Model-Based Behavior in Reinforcement Learning »
Harm Van Seijen · Hadi Nekoei · Evan Racah · Sarath Chandar -
2019 : Morning Coffee Break & Poster Session »
Eric Metodiev · Keming Zhang · Markus Stoye · Randy Churchill · Soumalya Sarkar · Miles Cranmer · Johann Brehmer · Danilo Jimenez Rezende · Peter Harrington · AkshatKumar Nigam · Nils Thuerey · Lukasz Maziarka · Alvaro Sanchez Gonzalez · Atakan Okan · James Ritchie · N. Benjamin Erichson · Harvey Cheng · Peihong Jiang · Seong Ho Pahng · Samson Koelle · Sami Khairy · Adrian Pol · Rushil Anirudh · Jannis Born · Benjamin Sanchez-Lengeling · Brian Timar · Rhys Goodall · Tamás Kriváchy · Lu Lu · Thomas Adler · Nathaniel Trask · Noëlie Cherrier · Tomohiko Konno · Muhammad Kasim · Tobias Golling · Zaccary Alperstein · Andrei Ustyuzhanin · James Stokes · Anna Golubeva · Ian Char · Ksenia Korovina · Youngwoo Cho · Chanchal Chatterjee · Tom Westerhout · Gorka Muñoz-Gil · Juan Zamudio-Fernandez · Jennifer Wei · Brian Lee · Johannes Kofler · Bruce Power · Nikita Kazeev · Andrey Ustyuzhanin · Artem Maevskiy · Pascal Friederich · Arash Tavakoli · Willie Neiswanger · Bohdan Kulchytskyy · sindhu hari · Paul Leu · Paul Atzberger -
2018 : Poster Session 1 + Coffee »
Tom Van de Wiele · Rui Zhao · J. Fernando Hernandez-Garcia · Fabio Pardo · Xian Yeow Lee · Xiaolin Andy Li · Marcin Andrychowicz · Jie Tang · Suraj Nair · Juhyeon Lee · Cédric Colas · S. M. Ali Eslami · Yen-Chen Wu · Stephen McAleer · Ryan Julian · Yang Xue · Matthia Sabatelli · Pranav Shyam · Alexandros Kalousis · Giovanni Montana · Emanuele Pesce · Felix Leibfried · Zhanpeng He · Chunxiao Liu · Yanjun Li · Yoshihide Sawada · Alexander Pashevich · Tejas Kulkarni · Keiran Paster · Luca Rigazio · Quan Vuong · Hyunggon Park · Minhae Kwon · Rivindu Weerasekera · Shamane Siriwardhanaa · Rui Wang · Ozsel Kilinc · Keith Ross · Yizhou Wang · Simon Schmitt · Thomas Anthony · Evan Cater · Forest Agostinelli · Tegg Sung · Shirou Maruyama · Alexander Shmakov · Devin Schwab · Mohammad Firouzi · Glen Berseth · Denis Osipychev · Jesse Farebrother · Jianlan Luo · William Agnew · Peter Vrancx · Jonathan Heek · Catalin Ionescu · Haiyan Yin · Megumi Miyashita · Nathan Jay · Noga H. Rotman · Sam Leroux · Shaileshh Bojja Venkatakrishnan · Henri Schmidt · Jack Terwilliger · Ishan Durugkar · Jonathan Sauder · David Kas · Arash Tavakoli · Alain-Sam Cohen · Philip Bontrager · Adam Lerer · Thomas Paine · Ahmed Khalifa · Ruben Rodriguez · Avi Singh · Yiming Zhang -
2017 Poster: Hybrid Reward Architecture for Reinforcement Learning »
Harm Van Seijen · Mehdi Fatemi · Romain Laroche · Joshua Romoff · Tavian Barnes · Jeffrey Tsang