Timezone: »
Hashing is a common method to reduce large, potentially infinite feature vectors to a fixed-size table. In reinforcement learning, hashing is often used in conjunction with tile coding to represent states in continuous spaces. Hashing is also a promising approach to value function approximation in large discrete domains such as Go and Hearts, where feature vectors can be constructed by exhaustively combining a set of atomic features. Unfortunately, the typical use of hashing in value function approximation results in biased value estimates due to the possibility of collisions. Recent work in data stream summaries has led to the development of the tug-of-war sketch, an unbiased estimator for approximating inner products. Our work investigates the application of this new data structure to linear value function approximation. Although in the reinforcement learning setting the use of the tug-of-war sketch leads to biased value estimates, we show that this bias can be orders of magnitude less than that of standard hashing. We provide empirical results on two RL benchmark domains and fifty-five Atari 2600 games to highlight the superior learning performance of tug-of-war hashing.
Author Information
Marc Bellemare (University of Alberta)
Joel Veness (DeepMind)
Michael Bowling (DeepMind / University of Alberta)
More from the Same Authors
-
2016 : Computer Curling: AI in Sports Analytics »
Michael Bowling -
2016 Poster: The Forget-me-not Process »
Kieran Milan · Joel Veness · James Kirkpatrick · Michael Bowling · Anna Koop · Demis Hassabis -
2012 Poster: Tractable Objectives for Robust Policy Optimization »
Katherine Chen · Michael Bowling -
2011 Poster: Variance Reduction in Monte-Carlo Tree Search »
Joel Veness · Marc Lanctot · Michael Bowling -
2010 Workshop: Learning and Planning from Batch Time Series Data »
Daniel Lizotte · Michael Bowling · Susan Murphy · Joelle Pineau · Sandeep Vijan -
2010 Poster: Monte-Carlo Planning in Large POMDPs »
David Silver · Joel Veness -
2009 Poster: Bootstrapping from Game Tree Search »
Joel Veness · David Silver · William Uther · Alan Blair -
2009 Poster: Strategy Grafting in Extensive Games »
Kevin G Waugh · Nolan Bard · Michael Bowling -
2009 Oral: Bootstrapping from Game Tree Search »
Joel Veness · David Silver · William Uther · Alan Blair -
2009 Poster: Monte Carlo Sampling for Regret Minimization in Extensive Games »
Marc Lanctot · Kevin G Waugh · Martin A Zinkevich · Michael Bowling -
2008 Session: Oral session 3: Learning from Reinforcement: Modeling and Control »
Michael Bowling -
2007 Spotlight: Stable Dual Dynamic Programming »
Tao Wang · Daniel Lizotte · Michael Bowling · Dale Schuurmans -
2007 Poster: Stable Dual Dynamic Programming »
Tao Wang · Daniel Lizotte · Michael Bowling · Dale Schuurmans -
2007 Spotlight: Regret Minimization in Games with Incomplete Information »
Martin A Zinkevich · Michael Johanson · Michael Bowling · Carmelo Piccione -
2007 Poster: Regret Minimization in Games with Incomplete Information »
Martin A Zinkevich · Michael Johanson · Michael Bowling · Carmelo Piccione -
2007 Poster: Computing Robust Counter-Strategies »
Michael Johanson · Martin A Zinkevich · Michael Bowling -
2006 Poster: iLSTD: Convergence, Eligibility Traces, and Mountain Car »
Alborz Geramifard · Michael Bowling · Martin A Zinkevich · Richard Sutton