Timezone: »

OVD-Explorer: A General Information-theoretic Exploration Approach for Reinforcement Learning
Jinyi Liu · Zhi Wang · YAN ZHENG · Jianye Hao · Junjie Ye · Chenjia Bai · Pengyi Li
Event URL: https://openreview.net/forum?id=hP7fXPNuVvv »

Many exploration strategies are built upon the optimism in the face of the uncertainty (OFU) principle for reinforcement learning. However, without considering the aleatoric uncertainty, existing methods may over-explore the state-action pairs with large randomness and hence are non-robust. In this paper, we explicitly capture the aleatoric uncertainty from a distributional perspective and propose an information-theoretic exploration method named Optimistic Value Distribution Explorer (OVD-Explorer). OVD-Explorer follows the OFU principle, but more importantly, it avoids exploring the areas with high aleatoric uncertainty through maximizing the mutual information between policy and the upper bounds of policy's returns. Furthermore, to make OVD-Explorer tractable for continuous RL, we derive a closed form solution, and integrate it with SAC, which, to our knowledge, for the first time alleviates the negative impact on exploration caused by aleatoric uncertainty for continuous RL. Empirical evaluations on the commonly used Mujoco benchmark and a novel GridChaos task demonstrate that OVD-Explorer can alleviate over-exploration and outperform state-of-the-art methods.

Author Information

Jinyi Liu (Tianjin University)
Zhi Wang (Huawei Technologies Ltd.)
YAN ZHENG (Tianjin University)
Jianye Hao (Tianjin University)
Junjie Ye (The Chinese University of Hong Kong)
Chenjia Bai (Harbin Institute of Technology)
Pengyi Li (Tianjin University)

More from the Same Authors