Timezone: »

Using a Logarithmic Mapping to Enable Lower Discount Factors in Reinforcement Learning
Harm Van Seijen · Mehdi Fatemi · Arash Tavakoli

Tue Dec 10 04:50 PM -- 05:05 PM (PST) @ West Ballroom A + B

In an effort to better understand the different ways in which the discount factor affects the optimization process in reinforcement learning, we designed a set of experiments to study each effect in isolation. Our analysis reveals that the common perception that poor performance of low discount factors is caused by (too) small action-gaps requires revision. We propose an alternative hypothesis that identifies the size-difference of the action-gap across the state-space as the primary cause. We then introduce a new method that enables more homogeneous action-gaps by mapping value estimates to a logarithmic space. We prove convergence for this method under standard assumptions and demonstrate empirically that it indeed enables lower discount factors for approximate reinforcement-learning methods. This in turn allows tackling a class of reinforcement-learning problems that are challenging to solve with traditional methods.

Author Information

Harm Van Seijen (Microsoft Research)
Mehdi Fatemi (Microsoft Research)
Arash Tavakoli (Imperial College London)

I am a Ph.D. candidate at Imperial College London. My research interests lie broadly in Artificial Intelligence, with particular focus on Machine Learning and Reinforcement Learning.

Related Events (a corresponding poster, oral, or spotlight)

More from the Same Authors