Timezone: »
Under/overestimation of state/action values are harmful for reinforcement learning agents. In this paper, we show that a state/action value estimated using the Bellman equation can be decomposed to a weighted sum of path-wise values that follow log-normal distributions. Since log-normal distributions are skewed, the distribution of estimated state/action values can also be skewed, leading to an imbalanced likelihood of under/overestimation. The degree of such imbalance can vary greatly among actions and policies within a single problem instance, making the agent prone to select actions/policies that have inferior expected return and higher likelihood of overestimation. We present a comprehensive analysis to such skewness, examine its factors and impacts through both theoretical and empirical results, and discuss the possible ways to reduce its undesirable effects.
Author Information
Liangpeng Zhang (University of Birmingham)
Ke Tang (Southern University of Science and Technology)
Xin Yao (Southern University of Science and Technology, China)
More from the Same Authors
-
2019 Poster: Optimal Stochastic and Online Learning with Individual Iterates »
Yunwen Lei · Peng Yang · Ke Tang · Ding-Xuan Zhou -
2019 Spotlight: Optimal Stochastic and Online Learning with Individual Iterates »
Yunwen Lei · Peng Yang · Ke Tang · Ding-Xuan Zhou -
2019 Poster: Explicit Planning for Efficient Exploration in Reinforcement Learning »
Liangpeng Zhang · Ke Tang · Xin Yao -
2018 Poster: Stochastic Composite Mirror Descent: Optimal Bounds with High Probabilities »
Yunwen Lei · Ke Tang -
2017 Poster: Subset Selection under Noise »
Chao Qian · Jing-Cheng Shi · Yang Yu · Ke Tang · Zhi-Hua Zhou