Timezone: »
Most algorithms for solving Markov decision processes rely on a discount factor, which ensures their convergence. In fact, it is often used in problems with is no intrinsic motivation. In this paper, we show that when used in approximate dynamic programming, an artificially low discount factor may significantly improve the performance on some problems, such as Tetris. We propose two explanations for this phenomenon. Our first justification follows directly from the standard approximation error bounds: using a lower discount factor may decrease the approximation error bounds. However, we also show that these bounds are loose, a thus their decrease does not entirely justify a better practical performance. We thus propose another justification: when the rewards are received only sporadically (as it is the case in Tetris), we can derive tighter bounds, which support a significant performance increase with a decrease in the discount factor.
Author Information
Marek Petrik (IBM)
Bruno Scherrer (INRIA)
More from the Same Authors
-
2013 Poster: Approximate Dynamic Programming Finally Performs Well in the Game of Tetris »
Victor Gabillon · Mohammad Ghavamzadeh · Bruno Scherrer -
2013 Poster: Improved and Generalized Upper Bounds on the Complexity of Policy Iteration »
Bruno Scherrer -
2012 Poster: On the Use of Non-Stationary Policies for Stationary Infinite-Horizon Markov Decision Processes »
Bruno Scherrer · Boris Lesner -
2012 Oral: On the Use of Non-Stationary Policies for Stationary Infinite-Horizon Markov Decision Processes »
Bruno Scherrer · Boris Lesner -
2009 Poster: Robust Value Function Approximation Using Bilinear Programming »
Marek Petrik · Shlomo Zilberstein -
2009 Spotlight: Robust Value Function Approximation Using Bilinear Programming »
Marek Petrik · Shlomo Zilberstein