Timezone: »

Value Pursuit Iteration
Amir-massoud Farahmand · Doina Precup

Thu Dec 06 02:00 PM -- 12:00 AM (PST) @ Harrah’s Special Events Center 2nd Floor

Value Pursuit Iteration (VPI) is an approximate value iteration algorithm that finds a close to optimal policy for reinforcement learning and planning problems with large state spaces. VPI has two main features: First, it is a nonparametric algorithm that finds a good sparse approximation of the optimal value function given a dictionary of features. The algorithm is almost insensitive to the number of irrelevant features. Second, after each iteration of VPI, the algorithm adds a set of functions based on the currently learned value function to the dictionary. This increases the representation power of the dictionary in a way that is directly relevant to the goal of having a good approximation of the optimal value function. We theoretically study VPI and provide a finite-sample error upper bound for it.

Author Information

Amir-massoud Farahmand (Vector Institute)
Doina Precup (McGill University / Mila / DeepMind Montreal)

More from the Same Authors