Timezone: »
Sample-based planning is a powerful family of algorithms for generating intelligent behavior from a model of the environment. Generating good candidate actions is critical to the success of sample-based planners, particularly in continuous or large action spaces. Typically, candidate action generation exhausts the action space, uses domain knowledge, or more recently, involves learning a stochastic policy to provide such search guidance. In this paper we explore explicitly learning a candidate action generator by optimizing a novel objective, marginal utility. The marginal utility of an action generator measures the increase in value of an action over previously generated actions. We validate our approach in both curling, a challenging stochastic domain with continuous state and action spaces, and a location game with a discrete but large action space. We show that a generator trained with the marginal utility objective outperforms hand-coded schemes built on substantial domain knowledge, trained stochastic policies, and other natural objectives for generating actions for sampled-based planners.
Author Information
Zaheen Ahmad (University of Alberta)
Levi Lelis (Universidade Federal de Viçosa)
Michael Bowling (University of Alberta / DeepMind)
More from the Same Authors
-
2022 : Adapting the Function Approximation Architecture in Online Reinforcement Learning »
John Martin · Joseph Modayil · Fatima Davelouis · Michael Bowling -
2022 : Learning to Prioritize Planning Updates in Model-based Reinforcement Learning »
Brad Burega · John Martin · Michael Bowling -
2022 : Adapting the Function Approximation Architecture in Online Reinforcement Learning »
Fatima Davelouis · John Martin · Joseph Modayil · Michael Bowling -
2021 Poster: Monte Carlo Tree Search With Iteratively Refining State Abstractions »
Samuel Sokota · Caleb Y Ho · Zaheen Ahmad · J. Zico Kolter -
2020 : Invited Talk 2: Michael Bowling (University of Alberta) - Hindsight Rationality: Alternatives to Nash »
Michael Bowling -
2020 : Discussion Panel: Hugo Larochelle, Finale Doshi-Velez, Devi Parikh, Marc Deisenroth, Julien Mairal, Katja Hofmann, Phillip Isola, and Michael Bowling »
Hugo Larochelle · Finale Doshi-Velez · Marc Deisenroth · Devi Parikh · Julien Mairal · Katja Hofmann · Phillip Isola · Michael Bowling -
2019 Poster: Ease-of-Teaching and Language Structure from Emergent Communication »
Fushan Li · Michael Bowling -
2018 Poster: Single-Agent Policy Tree Search With Guarantees »
Laurent Orseau · Levi Lelis · Tor Lattimore · Theophane Weber -
2016 : Computer Curling: AI in Sports Analytics »
Michael Bowling -
2016 Poster: The Forget-me-not Process »
Kieran Milan · Joel Veness · James Kirkpatrick · Michael Bowling · Anna Koop · Demis Hassabis -
2012 Poster: Sketch-Based Linear Value Function Approximation »
Marc Bellemare · Joel Veness · Michael Bowling -
2012 Poster: Tractable Objectives for Robust Policy Optimization »
Katherine Chen · Michael Bowling -
2011 Poster: Variance Reduction in Monte-Carlo Tree Search »
Joel Veness · Marc Lanctot · Michael Bowling -
2010 Workshop: Learning and Planning from Batch Time Series Data »
Daniel Lizotte · Michael Bowling · Susan Murphy · Joelle Pineau · Sandeep Vijan -
2009 Poster: Strategy Grafting in Extensive Games »
Kevin G Waugh · Nolan Bard · Michael Bowling -
2009 Poster: Monte Carlo Sampling for Regret Minimization in Extensive Games »
Marc Lanctot · Kevin G Waugh · Martin A Zinkevich · Michael Bowling -
2008 Session: Oral session 3: Learning from Reinforcement: Modeling and Control »
Michael Bowling -
2007 Spotlight: Stable Dual Dynamic Programming »
Tao Wang · Daniel Lizotte · Michael Bowling · Dale Schuurmans -
2007 Poster: Stable Dual Dynamic Programming »
Tao Wang · Daniel Lizotte · Michael Bowling · Dale Schuurmans -
2007 Spotlight: Regret Minimization in Games with Incomplete Information »
Martin A Zinkevich · Michael Johanson · Michael Bowling · Carmelo Piccione -
2007 Poster: Regret Minimization in Games with Incomplete Information »
Martin A Zinkevich · Michael Johanson · Michael Bowling · Carmelo Piccione -
2007 Poster: Computing Robust Counter-Strategies »
Michael Johanson · Martin A Zinkevich · Michael Bowling -
2006 Poster: iLSTD: Convergence, Eligibility Traces, and Mountain Car »
Alborz Geramifard · Michael Bowling · Martin A Zinkevich · Richard Sutton