Workshop: Offline Reinforcement Learning

Offline Reinforcement Learning with Munchausen Regularization

Hsin-Yu Liu · Bharathan Balaji · Dezhi Hong


Most temporal differences based (TD-based) Reinforcement Learning (RL) methods focus on replacing the true value of a transiting state by their current estimate of this value. Munchausen-RL (M-RL) proposes the idea of incorporating the current policy to be leveraged to bootstrap RL. The concept of penalizing two consecutive policies that are far from each other is also applicable to offline settings. In our work, we add the Munchausen term in the Q-update step to penalize policies that deviate from previous policy too far. Our results indicate that this method could be implemented in various offline Q-learning methods to help improve the performance. In addition, we evaluate how prioritized experience replay affects offline RL. Our results show that Munchausen Offline RL outperforms the original methods that are without the regularization term.

Chat is not available.