Timezone: »

Action-Gap Phenomenon in Reinforcement Learning
Amir-massoud Farahmand

Tue Dec 13 01:40 AM -- 01:44 AM (PST) @ None

Many practitioners of reinforcement learning problems have observed that oftentimes the performance of the agent reaches very close to the optimal performance even though the estimated (action-)value function is still far from the optimal one. The goal of this paper is to explain and formalize this phenomenon by introducing the concept of the action-gap regularity. As a typical result, we prove that for an agent following the greedy policy \hat{\pi} with respect to an action-value function \hat{Q}, the performance loss E[V^(X) - V^{\hat{X}} (X)] is upper bounded by O(|| \hat{Q} - Q^||_\infty^{1+\zeta}), in which \zeta >= 0 is the parameter quantifying the action-gap regularity. For \zeta > 0, our results indicate smaller performance loss compared to what previous analyses had suggested. Finally, we show how this regularity affects the performance of the family of approximate value iteration algorithms.

Author Information

Amir-massoud Farahmand (Vector Institute)

Related Events (a corresponding poster, oral, or spotlight)

More from the Same Authors