Timezone: »
Many practitioners of reinforcement learning problems have observed that oftentimes the performance of the agent reaches very close to the optimal performance even though the estimated (action-)value function is still far from the optimal one. The goal of this paper is to explain and formalize this phenomenon by introducing the concept of the action-gap regularity. As a typical result, we prove that for an agent following the greedy policy \hat{\pi} with respect to an action-value function \hat{Q}, the performance loss E[V^(X) - V^{\hat{X}} (X)] is upper bounded by O(|| \hat{Q} - Q^||_\infty^{1+\zeta}), in which \zeta >= 0 is the parameter quantifying the action-gap regularity. For \zeta > 0, our results indicate smaller performance loss compared to what previous analyses had suggested. Finally, we show how this regularity affects the performance of the family of approximate value iteration algorithms.
Author Information
Amir-massoud Farahmand (Vector Institute)
Related Events (a corresponding poster, oral, or spotlight)
-
2011 Poster: Action-Gap Phenomenon in Reinforcement Learning »
Tue. Dec 13th 04:45 -- 10:59 PM Room
More from the Same Authors
-
2021 : Deep Reinforcement Learning for Online Control of Stochastic Partial Differential Equations »
Erfan Pirmorad · Farnam Mansouri · Amir-massoud Farahmand -
2017 Poster: Random Projection Filter Bank for Time Series Data »
Amir-massoud Farahmand · Sepideh Pourazarm · Daniel Nikovski -
2013 Poster: Learning from Limited Demonstrations »
Beomjoon Kim · Amir-massoud Farahmand · Joelle Pineau · Doina Precup -
2013 Poster: Bellman Error Based Feature Generation using Random Projections on Sparse Spaces »
Mahdi Milani Fard · Yuri Grinberg · Amir-massoud Farahmand · Joelle Pineau · Doina Precup -
2013 Spotlight: Learning from Limited Demonstrations »
Beomjoon Kim · Amir-massoud Farahmand · Joelle Pineau · Doina Precup -
2012 Poster: Value Pursuit Iteration »
Amir-massoud Farahmand · Doina Precup -
2010 Poster: Error Propagation for Approximate Policy and Value Iteration »
Amir-massoud Farahmand · Remi Munos · Csaba Szepesvari -
2008 Poster: Regularized Policy Iteration »
Amir-massoud Farahmand · Mohammad Ghavamzadeh · Csaba Szepesvari · Shie Mannor