Timezone: »
We consider a team of reinforcement learning agents that concurrently operate in a common environment, and we develop an approach to efficient coordinated exploration that is suitable for problems of practical scale. Our approach builds on the seed sampling concept introduced in Dimakopoulou and Van Roy (2018) and on a randomized value function learning algorithm from Osband et al. (2016). We demonstrate that, for simple tabular contexts, the approach is competitive with those previously proposed in Dimakopoulou and Van Roy (2018) and with a higher-dimensional problem and a neural network value function representation, the approach learns quickly with far fewer agents than alternative exploration schemes.
Author Information
Maria Dimakopoulou (Stanford University)
Ian Osband (Google Deepmind)
Benjamin Van Roy (Stanford University)
More from the Same Authors
-
2022 : On Rate-Distortion Theory in Capacity-Limited Cognition & Reinforcement Learning »
Dilip Arumugam · Mark Ho · Noah Goodman · Benjamin Van Roy -
2022 Poster: An Information-Theoretic Framework for Deep Learning »
Hong Jun Jeon · Benjamin Van Roy -
2022 Poster: Deciding What to Model: Value-Equivalent Sampling for Reinforcement Learning »
Dilip Arumugam · Benjamin Van Roy -
2021 Workshop: Causal Inference Challenges in Sequential Decision Making: Bridging Theory and Practice »
Aurelien Bibaut · Maria Dimakopoulou · Nathan Kallus · Xinkun Nie · Masatoshi Uehara · Kelly Zhang -
2021 : Environment Capacity »
Benjamin Van Roy -
2021 Poster: The Value of Information When Deciding What to Learn »
Dilip Arumugam · Benjamin Van Roy -
2021 Poster: Risk Minimization from Adaptively Collected Data: Guarantees for Supervised and Policy Learning »
Aurelien Bibaut · Nathan Kallus · Maria Dimakopoulou · Antoine Chambaz · Mark van der Laan -
2021 Poster: Post-Contextual-Bandit Inference »
Aurelien Bibaut · Maria Dimakopoulou · Nathan Kallus · Antoine Chambaz · Mark van der Laan -
2021 Poster: Online Multi-Armed Bandits with Adaptive Inference »
Maria Dimakopoulou · Zhimei Ren · Zhengyuan Zhou -
2019 : Reinforcement Learning Beyond Optimization »
Benjamin Van Roy -
2019 Poster: Information-Theoretic Confidence Bounds for Reinforcement Learning »
Xiuyuan Lu · Benjamin Van Roy -
2018 Poster: An Information-Theoretic Analysis for Thompson Sampling with Many Actions »
Shi Dong · Benjamin Van Roy -
2018 Poster: Randomized Prior Functions for Deep Reinforcement Learning »
Ian Osband · John Aslanides · Albin Cassirer -
2018 Spotlight: Randomized Prior Functions for Deep Reinforcement Learning »
Ian Osband · John Aslanides · Albin Cassirer -
2017 : Poster session »
Abbas Zaidi · Christoph Kurz · David Heckerman · YiJyun Lin · Stefan Riezler · Ilya Shpitser · Songbai Yan · Olivier Goudet · Yash Deshpande · Judea Pearl · Jovana Mitrovic · Brian Vegetabile · Tae Hwy Lee · Karen Sachs · Karthika Mohan · Reagan Rose · Julius Ramakers · Negar Hassanpour · Pierre Baldi · Razieh Nabi · Noah Hammarlund · Eli Sherman · Carolin Lawrence · Fattaneh Jabbari · Vira Semenova · Maria Dimakopoulou · Pratik Gajane · Russell Greiner · Ilias Zadik · Alexander Blocker · Hao Xu · Tal EL HAY · Tony Jebara · Benoit Rostykus -
2017 Poster: Ensemble Sampling »
Xiuyuan Lu · Benjamin Van Roy -
2017 Poster: Conservative Contextual Linear Bandits »
Abbas Kazerouni · Mohammad Ghavamzadeh · Yasin Abbasi · Benjamin Van Roy -
2016 Poster: Deep Exploration via Bootstrapped DQN »
Ian Osband · Charles Blundell · Alexander Pritzel · Benjamin Van Roy -
2014 Workshop: Large-scale reinforcement learning and Markov decision problems »
Benjamin Van Roy · Mohammad Ghavamzadeh · Peter Bartlett · Yasin Abbasi Yadkori · Ambuj Tewari -
2014 Poster: Near-optimal Reinforcement Learning in Factored MDPs »
Ian Osband · Benjamin Van Roy -
2014 Poster: Learning to Optimize via Information-Directed Sampling »
Daniel Russo · Benjamin Van Roy -
2014 Spotlight: Near-optimal Reinforcement Learning in Factored MDPs »
Ian Osband · Benjamin Van Roy -
2014 Poster: Model-based Reinforcement Learning and the Eluder Dimension »
Ian Osband · Benjamin Van Roy -
2013 Poster: (More) Efficient Reinforcement Learning via Posterior Sampling »
Ian Osband · Daniel Russo · Benjamin Van Roy -
2013 Poster: Eluder Dimension and the Sample Complexity of Optimistic Exploration »
Daniel Russo · Benjamin Van Roy -
2013 Oral: Eluder Dimension and the Sample Complexity of Optimistic Exploration »
Daniel Russo · Benjamin Van Roy -
2013 Poster: Efficient Exploration and Value Function Generalization in Deterministic Systems »
Zheng Wen · Benjamin Van Roy -
2012 Poster: Efficient Reinforcement Learning for High Dimensional Linear Quadratic Systems »
Morteza Ibrahimi · Adel Javanmard · Benjamin Van Roy -
2009 Poster: Directed Regression »
Yi-Hao Kao · Benjamin Van Roy · Xiang Yan