Timezone: »
Poster
(More) Efficient Reinforcement Learning via Posterior Sampling
Ian Osband · Daniel Russo · Benjamin Van Roy
Sun Dec 08 02:00 PM -- 06:00 PM (PST) @ Harrah's Special Events Center, 2nd Floor #None
Most provably efficient learning algorithms introduce optimism about poorly-understood states and actions to encourage exploration. We study an alternative approach for efficient exploration, posterior sampling for reinforcement learning (PSRL). This algorithm proceeds in repeated episodes of known duration. At the start of each episode, PSRL updates a prior distribution over Markov decision processes and takes one sample from this posterior. PSRL then follows the policy that is optimal for this sample during the episode. The algorithm is conceptually simple, computationally efficient and allows an agent to encode prior knowledge in a natural way. We establish an $\tilde{O}(\tau S \sqrt{AT} )$ bound on the expected regret, where $T$ is time, $\tau$ is the episode length and $S$ and $A$ are the cardinalities of the state and action spaces. This bound is one of the first for an algorithm not based on optimism and close to the state of the art for any reinforcement learning algorithm. We show through simulation that PSRL significantly outperforms existing algorithms with similar regret bounds.
Author Information
Ian Osband (DeepMind)
Daniel Russo (Columbia University)
Benjamin Van Roy (Stanford University)
More from the Same Authors
-
2019 Poster: Information-Theoretic Confidence Bounds for Reinforcement Learning »
Xiuyuan Lu · Benjamin Van Roy -
2019 Poster: Worst-Case Regret Bounds for Exploration via Randomized Value Functions »
Daniel Russo -
2018 Poster: An Information-Theoretic Analysis for Thompson Sampling with Many Actions »
Shi Dong · Benjamin Van Roy -
2018 Poster: Scalable Coordinated Exploration in Concurrent Reinforcement Learning »
Maria Dimakopoulou · Ian Osband · Benjamin Van Roy -
2018 Poster: Randomized Prior Functions for Deep Reinforcement Learning »
Ian Osband · John Aslanides · Albin Cassirer -
2018 Spotlight: Randomized Prior Functions for Deep Reinforcement Learning »
Ian Osband · John Aslanides · Albin Cassirer -
2017 Poster: Ensemble Sampling »
Xiuyuan Lu · Benjamin Van Roy -
2017 Poster: Conservative Contextual Linear Bandits »
Abbas Kazerouni · Mohammad Ghavamzadeh · Yasin Abbasi · Benjamin Van Roy -
2017 Poster: Improving the Expected Improvement Algorithm »
Chao Qin · Diego Klabjan · Daniel Russo -
2016 Poster: Deep Exploration via Bootstrapped DQN »
Ian Osband · Charles Blundell · Alexander Pritzel · Benjamin Van Roy -
2014 Workshop: Large-scale reinforcement learning and Markov decision problems »
Benjamin Van Roy · Mohammad Ghavamzadeh · Peter Bartlett · Yasin Abbasi Yadkori · Ambuj Tewari -
2014 Poster: Near-optimal Reinforcement Learning in Factored MDPs »
Ian Osband · Benjamin Van Roy -
2014 Poster: Learning to Optimize via Information-Directed Sampling »
Daniel Russo · Benjamin Van Roy -
2014 Spotlight: Near-optimal Reinforcement Learning in Factored MDPs »
Ian Osband · Benjamin Van Roy -
2014 Poster: Model-based Reinforcement Learning and the Eluder Dimension »
Ian Osband · Benjamin Van Roy -
2013 Poster: Eluder Dimension and the Sample Complexity of Optimistic Exploration »
Daniel Russo · Benjamin Van Roy -
2013 Oral: Eluder Dimension and the Sample Complexity of Optimistic Exploration »
Daniel Russo · Benjamin Van Roy -
2013 Poster: Efficient Exploration and Value Function Generalization in Deterministic Systems »
Zheng Wen · Benjamin Van Roy -
2012 Poster: Efficient Reinforcement Learning for High Dimensional Linear Quadratic Systems »
Morteza Ibrahimi · Adel Javanmard · Benjamin Van Roy -
2009 Poster: Directed Regression »
Yi-Hao Kao · Benjamin Van Roy · Xiang Yan