Timezone: »
One of the main learning tasks in Reinforcement Learning (RL) is to approximate the value function – a mapping from the present observation to the expected sum of future rewards. Neural network architectures for value function approximation typically impose sparse connections with prior knowledge of observational structure. When this structure is known, architectures such as convolutions, transformers, and graph neural networks can be inductively biased with fixed connections. However, there are times when observational structure is unavailable or too difficult to encode as an architectural bias – for instance, relating sensors that are randomly dispersed in space. Yet in all of these situations it is still desirable to approximate value functions with a sparsely-connected architecture for computational efficiency. An important open question is whether equally-useful representations can be constructed when observational structure is unknown – particularly in the incremental, online setting without access to a replay buffer.Our work is concerned with how a RL system could construct a value function approximation architecture in the absence of observational structure. We propose an online algorithm that adapts connections of a neural network using information derived strictly from the learner’s experience stream, using many parallel auxiliary predictions. Auxiliary predictions are specified as General Value Functions (GVFs) [11], and their weights are used to relate inputs and form subsets we call neighborhoods. These represent the input of fully-connected, random subnetworks that provide nonlinear features for a main value function. We validate our algorithm in a synthetic domain with high-dimensional stochastic observations. Results show that our method can adapt an approximation architecture without incurring substantial performance loss, while also discovering a local degree of spatial structure in the observations without prior knowledge.
Author Information
John Martin (University of Alberta)
Joseph Modayil (DeepMind)
Fatima Davelouis (University of Alberta)
I am a Master's student at the University of Alberta, working on reinforcement learning with Professor Michael Bowling. In my current research, I aim to build representations from an agent's predictions in POMDP settings.
Michael Bowling (DeepMind / University of Alberta)
More from the Same Authors
-
2022 : Learning to Prioritize Planning Updates in Model-based Reinforcement Learning »
Brad Burega · John Martin · Michael Bowling -
2022 : Adapting the Function Approximation Architecture in Online Reinforcement Learning »
Fatima Davelouis · John Martin · Joseph Modayil · Michael Bowling -
2022 : Oral Presentation 7: Adapting the Function Approximation Architecture in Online Reinforcement Learning »
Fatima Davelouis -
2020 : Invited Talk 2: Michael Bowling (University of Alberta) - Hindsight Rationality: Alternatives to Nash »
Michael Bowling -
2020 Poster: Marginal Utility for Planning in Continuous or Large Discrete Action Spaces »
Zaheen Ahmad · Levi Lelis · Michael Bowling -
2020 : Discussion Panel: Hugo Larochelle, Finale Doshi-Velez, Devi Parikh, Marc Deisenroth, Julien Mairal, Katja Hofmann, Phillip Isola, and Michael Bowling »
Hugo Larochelle · Finale Doshi-Velez · Marc Deisenroth · Devi Parikh · Julien Mairal · Katja Hofmann · Phillip Isola · Michael Bowling -
2019 Poster: Ease-of-Teaching and Language Structure from Emergent Communication »
Fushan Li · Michael Bowling -
2016 : Computer Curling: AI in Sports Analytics »
Michael Bowling -
2016 Poster: The Forget-me-not Process »
Kieran Milan · Joel Veness · James Kirkpatrick · Michael Bowling · Anna Koop · Demis Hassabis -
2012 Poster: Sketch-Based Linear Value Function Approximation »
Marc Bellemare · Joel Veness · Michael Bowling -
2012 Poster: Tractable Objectives for Robust Policy Optimization »
Katherine Chen · Michael Bowling -
2011 Poster: Variance Reduction in Monte-Carlo Tree Search »
Joel Veness · Marc Lanctot · Michael Bowling -
2010 Workshop: Learning and Planning from Batch Time Series Data »
Daniel Lizotte · Michael Bowling · Susan Murphy · Joelle Pineau · Sandeep Vijan -
2009 Poster: Strategy Grafting in Extensive Games »
Kevin G Waugh · Nolan Bard · Michael Bowling -
2009 Poster: Monte Carlo Sampling for Regret Minimization in Extensive Games »
Marc Lanctot · Kevin G Waugh · Martin A Zinkevich · Michael Bowling -
2008 Session: Oral session 3: Learning from Reinforcement: Modeling and Control »
Michael Bowling -
2007 Spotlight: Stable Dual Dynamic Programming »
Tao Wang · Daniel Lizotte · Michael Bowling · Dale Schuurmans -
2007 Poster: Stable Dual Dynamic Programming »
Tao Wang · Daniel Lizotte · Michael Bowling · Dale Schuurmans -
2007 Spotlight: Regret Minimization in Games with Incomplete Information »
Martin A Zinkevich · Michael Johanson · Michael Bowling · Carmelo Piccione -
2007 Poster: Regret Minimization in Games with Incomplete Information »
Martin A Zinkevich · Michael Johanson · Michael Bowling · Carmelo Piccione -
2007 Poster: Computing Robust Counter-Strategies »
Michael Johanson · Martin A Zinkevich · Michael Bowling -
2006 Poster: iLSTD: Convergence, Eligibility Traces, and Mountain Car »
Alborz Geramifard · Michael Bowling · Martin A Zinkevich · Richard Sutton