NeurIPS Poster Limiting Extrapolation in Linear Approximate Value Iteration

Poster

Limiting Extrapolation in Linear Approximate Value Iteration

Andrea Zanette · Alessandro Lazaric · Mykel J Kochenderfer · Emma Brunskill

East Exhibition Hall B, C #193

Keywords: [ Reinforcement Learning and Planning ] [ Reinforcement Learning ] [ Reinforcement Learning and Planning -> Markov Decision Processes; Reinforcement Learning and Planning ]

[ Abstract ]

Abstract:

We study linear approximate value iteration (LAVI) with a generative model. While linear models may accurately represent the optimal value function using a few parameters, several empirical and theoretical studies show the combination of least-squares projection with the Bellman operator may be expansive, thus leading LAVI to amplify errors over iterations and eventually diverge. We introduce an algorithm that approximates value functions by combining Q-values estimated at a set of \textit{anchor} states. Our algorithm tries to balance the generalization and compactness of linear methods with the small amplification of errors typical of interpolation methods. We prove that if the features at any state can be represented as a convex combination of features at the anchor points, then errors are propagated linearly over iterations (instead of exponentially) and our method achieves a polynomial sample complexity bound in the horizon and the number of anchor points. These findings are confirmed in preliminary simulations in a number of simple problems where a traditional least-square LAVI method diverges.

Live content is unavailable. Log in and register to view live content