Timezone: »

Limiting Extrapolation in Linear Approximate Value Iteration
Andrea Zanette · Alessandro Lazaric · Mykel J Kochenderfer · Emma Brunskill

Tue Dec 10 10:45 AM -- 12:45 PM (PST) @ East Exhibition Hall B + C #193

We study linear approximate value iteration (LAVI) with a generative model. While linear models may accurately represent the optimal value function using a few parameters, several empirical and theoretical studies show the combination of least-squares projection with the Bellman operator may be expansive, thus leading LAVI to amplify errors over iterations and eventually diverge. We introduce an algorithm that approximates value functions by combining Q-values estimated at a set of \textit{anchor} states. Our algorithm tries to balance the generalization and compactness of linear methods with the small amplification of errors typical of interpolation methods. We prove that if the features at any state can be represented as a convex combination of features at the anchor points, then errors are propagated linearly over iterations (instead of exponentially) and our method achieves a polynomial sample complexity bound in the horizon and the number of anchor points. These findings are confirmed in preliminary simulations in a number of simple problems where a traditional least-square LAVI method diverges.

Author Information

Andrea Zanette (Stanford University)
Alessandro Lazaric (Facebook Artificial Intelligence Research)
Mykel J Kochenderfer (Stanford University)
Emma Brunskill (Stanford University)

More from the Same Authors