Timezone: »
Posterior sampling for reinforcement learning (PSRL) is an effective method for balancing exploration and exploitation in reinforcement learning. Randomised value functions (RVF) can be viewed as a promising approach to scaling PSRL. However, we show that most contemporary algorithms combining RVF with neural network function approximation do not possess the properties which make PSRL effective, and provably fail in sparse reward problems. Moreover, we find that propagation of uncertainty, a property of PSRL previously thought important for exploration, does not preclude this failure. We use these insights to design Successor Uncertainties (SU), a cheap and easy to implement RVF algorithm that retains key properties of PSRL. SU is highly effective on hard tabular exploration benchmarks. Furthermore, on the Atari 2600 domain, it surpasses human performance on 38 of 49 games tested (achieving a median human normalised score of 2.09), and outperforms its closest RVF competitor, Bootstrapped DQN, on 36 of those.
Author Information
David Janz (University of Cambridge)
Jiri Hron (University of Cambridge)
Przemysław Mazur (Wayve)
Katja Hofmann (Microsoft Research)
Dr. Katja Hofmann is a Principal Researcher at the [Game Intelligence](http://aka.ms/gameintelligence/) group at [Microsoft Research Cambridge, UK](https://www.microsoft.com/en-us/research/lab/microsoft-research-cambridge/). There, she leads a research team that focuses on reinforcement learning with applications in modern video games. She and her team strongly believe that modern video games will drive a transformation of how we interact with AI technology. One of the projects developed by her team is [Project Malmo](https://www.microsoft.com/en-us/research/project/project-malmo/), which uses the popular game Minecraft as an experimentation platform for developing intelligent technology. Katja's long-term goal is to develop AI systems that learn to collaborate with people, to empower their users and help solve complex real-world problems. Before joining Microsoft Research, Katja completed her PhD in Computer Science as part of the [ILPS](https://ilps.science.uva.nl/) group at the [University of Amsterdam](https://www.uva.nl/en). She worked with Maarten de Rijke and Shimon Whiteson on interactive machine learning algorithms for search engines.
José Miguel Hernández-Lobato (University of Cambridge)
Sebastian Tschiatschek (Microsoft Research)
More from the Same Authors
-
2021 : A Fresh Look at De Novo Molecular Design Benchmarks »
Austin Tripp · Gregor Simm · José Miguel Hernández-Lobato -
2021 : Depth Uncertainty Networks for Active Learning »
Chelsea Murray · James Allingham · Javier Antorán · José Miguel Hernández-Lobato -
2022 : Flow Annealed Importance Sampling Bootstrap »
Laurence Midgley · Vincent Stimper · Gregor Simm · Bernhard Schölkopf · José Miguel Hernández-Lobato -
2022 : Meta-learning Adaptive Deep Kernel Gaussian Processes for Molecular Property Prediction »
Wenlin Chen · Austin Tripp · José Miguel Hernández-Lobato -
2022 : Learning Generative Models with Invariance to Symmetries »
James Allingham · Javier Antorán · Shreyas Padhy · Eric Nalisnick · José Miguel Hernández-Lobato -
2022 : Contextual Squeeze-and-Excitation »
Massimiliano Patacchiola · John Bronskill · Aliaksandra Shysheya · Katja Hofmann · Sebastian Nowozin · Richard Turner -
2022 : Imitating Human Behaviour with Diffusion Models »
Tim Pearce · Tabish Rashid · Anssi Kanervisto · David Bignell · Mingfei Sun · Raluca Georgescu · Sergio Valcarcel Macua · Shan Zheng Tan · Ida Momennejad · Katja Hofmann · Sam Devlin -
2023 : Adam through a Second-Order Lens »
Ross Clarke · Baiyu Su · José Miguel Hernández-Lobato -
2023 : SE(3) Equivariant Augmented Coupling Flows »
Laurence Midgley · Vincent Stimper · Vincent Stimper · Javier Antorán · Emile Mathieu · Emile Mathieu · Bernhard Schölkopf · Bernhard Schölkopf · José Miguel Hernández-Lobato -
2023 : Retro-fallback: retrosynthetic planning in an uncertain world »
Austin Tripp · Krzysztof Maziarz · Sarah Lewis · Marwin Segler · José Miguel Hernández-Lobato -
2023 : Estimating optimal PAC-Bayes bounds with Hamiltonian Monte Carlo »
Szilvia Ujváry · Gergely Flamich · Vincent Fortuin · José Miguel Hernández-Lobato -
2023 Poster: Sampling from Gaussian Process Posteriors using Stochastic Gradient Descent »
Jihao Andreas Lin · Javier Antorán · Shreyas Padhy · David Janz · José Miguel Hernández-Lobato · Alexander Terenin -
2023 Oral: Sampling from Gaussian Process Posteriors using Stochastic Gradient Descent »
Jihao Andreas Lin · Javier Antorán · Shreyas Padhy · David Janz · José Miguel Hernández-Lobato · Alexander Terenin -
2023 Poster: Tanimoto Random Features for Scalable Molecular Machine Learning »
Austin Tripp · Sergio Bacallado · Sukriti Singh · José Miguel Hernández-Lobato -
2023 Poster: Faster Relative Entropy Coding with Greedy Rejection Coding »
Gergely Flamich · Stratis Markou · José Miguel Hernández-Lobato -
2023 Poster: SE(3) Equivariant Augmented Coupling Flows »
Laurence Midgley · Vincent Stimper · Javier Antorán · Emile Mathieu · Bernhard Schölkopf · José Miguel Hernández-Lobato -
2023 Poster: Compression with Bayesian Implicit Neural Representations »
Zongyu Guo · Gergely Flamich · Jiajun He · Zhibo Chen · José Miguel Hernández-Lobato -
2022 : Panel »
Roman Garnett · José Miguel Hernández-Lobato · Eytan Bakshy · Syrine Belakaria · Stefanie Jegelka -
2022 Poster: Uni[MASK]: Unified Inference in Sequential Decision Problems »
Micah Carroll · Orr Paradise · Jessy Lin · Raluca Georgescu · Mingfei Sun · David Bignell · Stephanie Milani · Katja Hofmann · Matthew Hausknecht · Anca Dragan · Sam Devlin -
2022 Poster: Missing Data Imputation and Acquisition with Deep Hierarchical Models and Hamiltonian Monte Carlo »
Ignacio Peis · Chao Ma · José Miguel Hernández-Lobato -
2022 Poster: Contextual Squeeze-and-Excitation for Efficient Few-Shot Image Classification »
Massimiliano Patacchiola · John Bronskill · Aliaksandra Shysheya · Katja Hofmann · Sebastian Nowozin · Richard Turner -
2021 : Towards RL applications in video games and with human users »
Katja Hofmann -
2021 Workshop: Deep Generative Models and Downstream Applications »
José Miguel Hernández-Lobato · Yingzhen Li · Yichuan Zhang · Cheng Zhang · Austin Tripp · Weiwei Pan · Oren Rippel -
2021 : Methods:: Understanding Human-like Behavior in Video Game Navigation »
Evelyn Zuniga · Stephanie Milani · Katja Hofmann -
2021 : IGLU: Interactive Grounded Language Understanding in a Collaborative Environment + Q&A »
Julia Kiseleva · Ziming Li · Mohammad Aliannejadi · Maartje Anne ter Hoeve · Mikhail Burtsev · Alexey Skrynnik · Artem Zholus · Aleksandr Panov · Katja Hofmann · Kavya Srinet · arthur szlam · Michel Galley · Ahmed Awadallah -
2021 Poster: Functional Variational Inference based on Stochastic Process Generators »
Chao Ma · José Miguel Hernández-Lobato -
2021 Poster: Improving black-box optimization in VAE latent space using decoder uncertainty »
Pascal Notin · José Miguel Hernández-Lobato · Yarin Gal -
2021 Poster: Grounding Spatio-Temporal Language with Transformers »
Tristan Karch · Laetitia Teodorescu · Katja Hofmann · Clément Moulin-Frier · Pierre-Yves Oudeyer -
2021 Poster: On Component Interactions in Two-Stage Recommender Systems »
Jiri Hron · Karl Krauth · Michael Jordan · Niki Kilbertus -
2021 Poster: Memory Efficient Meta-Learning with Large Images »
John Bronskill · Daniela Massiceti · Massimiliano Patacchiola · Katja Hofmann · Sebastian Nowozin · Richard Turner -
2020 Workshop: Competition Track Saturday »
Hugo Jair Escalante · Katja Hofmann -
2020 Workshop: Machine Learning for Molecules »
José Miguel Hernández-Lobato · Matt Kusner · Brooks Paige · Marwin Segler · Jennifer Wei -
2020 : Jose Miguel Hernandez Lobato »
José Miguel Hernández-Lobato -
2020 Workshop: Competition Track Friday »
Hugo Jair Escalante · Katja Hofmann -
2020 : Opening - Competition Track Session »
Katja Hofmann · Hugo Jair Escalante -
2020 Poster: Compressing Images by Encoding Their Latent Representations with Relative Entropy Coding »
Gergely Flamich · Marton Havasi · José Miguel Hernández-Lobato -
2020 Poster: Sample-Efficient Optimization in the Latent Space of Deep Generative Models via Weighted Retraining »
Austin Tripp · Erik Daxberger · José Miguel Hernández-Lobato -
2020 Poster: Depth Uncertainty in Neural Networks »
Javier Antorán · James Allingham · José Miguel Hernández-Lobato -
2020 Poster: VAEM: a Deep Generative Model for Heterogeneous Mixed Type Data »
Chao Ma · Sebastian Tschiatschek · Richard Turner · José Miguel Hernández-Lobato · Cheng Zhang -
2020 Poster: Barking up the right tree: an approach to search over molecule synthesis DAGs »
John Bradshaw · Brooks Paige · Matt Kusner · Marwin Segler · José Miguel Hernández-Lobato -
2020 Spotlight: Barking up the right tree: an approach to search over molecule synthesis DAGs »
John Bradshaw · Brooks Paige · Matt Kusner · Marwin Segler · José Miguel Hernández-Lobato -
2020 Session: Orals & Spotlights Track 15: COVID/Applications/Composition »
José Miguel Hernández-Lobato · Oliver Stegle -
2020 : Discussion Panel: Hugo Larochelle, Finale Doshi-Velez, Devi Parikh, Marc Deisenroth, Julien Mairal, Katja Hofmann, Phillip Isola, and Michael Bowling »
Hugo Larochelle · Finale Doshi-Velez · Marc Deisenroth · Devi Parikh · Julien Mairal · Katja Hofmann · Phillip Isola · Michael Bowling -
2019 : Multi-Task Reinforcement Learning and Generalization »
Katja Hofmann -
2019 : Lunch Break and Posters »
Xingyou Song · Elad Hoffer · Wei-Cheng Chang · Jeremy Cohen · Jyoti Islam · Yaniv Blumenfeld · Andreas Madsen · Jonathan Frankle · Sebastian Goldt · Satrajit Chatterjee · Abhishek Panigrahi · Alex Renda · Brian Bartoldson · Israel Birhane · Aristide Baratin · Niladri Chatterji · Roman Novak · Jessica Forde · YiDing Jiang · Yilun Du · Linara Adilova · Michael Kamp · Berry Weinstein · Itay Hubara · Tal Ben-Nun · Torsten Hoefler · Daniel Soudry · Hsiang-Fu Yu · Kai Zhong · Yiming Yang · Inderjit Dhillon · Jaime Carbonell · Yanqing Zhang · Dar Gilboa · Johannes Brandstetter · Alexander R Johansen · Gintare Karolina Dziugaite · Raghav Somani · Ari Morcos · Freddie Kalaitzis · Hanie Sedghi · Lechao Xiao · John Zech · Muqiao Yang · Simran Kaur · Qianli Ma · Yao-Hung Hubert Tsai · Ruslan Salakhutdinov · Sho Yaida · Zachary Lipton · Daniel Roy · Michael Carbin · Florent Krzakala · Lenka Zdeborová · Guy Gur-Ari · Ethan Dyer · Dilip Krishnan · Hossein Mobahi · Samy Bengio · Behnam Neyshabur · Praneeth Netrapalli · Kris Sankaran · Julien Cornebise · Yoshua Bengio · Vincent Michalski · Samira Ebrahimi Kahou · Md Rifat Arefin · Jiri Hron · Jaehoon Lee · Jascha Sohl-Dickstein · Samuel Schoenholz · David Schwab · Dongyu Li · Sang Choe · Henning Petzka · Ashish Verma · Zhichao Lin · Cristian Sminchisescu -
2019 : The MineRL competition »
Misa Ogura · Joe Booth · Sophia Sun · Nicholay Topin · Brandon Houghton · William Guss · Stephanie Milani · Oriol Vinyals · Katja Hofmann · JIA KIM · Karolis Ramanauskas · Florian Laurent · Daichi Nishio · Anssi Kanervisto · Alexey Skrynnik · Artemij Amiranashvili · Christian Scheller · KAIXIN WANG · Yanick Schraner -
2019 Workshop: Bayesian Deep Learning »
Yarin Gal · José Miguel Hernández-Lobato · Christos Louizos · Eric Nalisnick · Zoubin Ghahramani · Kevin Murphy · Max Welling -
2019 Poster: Bayesian Batch Active Learning as Sparse Subset Approximation »
Robert Pinsler · Jonathan Gordon · Eric Nalisnick · José Miguel Hernández-Lobato -
2019 Poster: Generalization in Reinforcement Learning with Selective Noise Injection and Information Bottleneck »
Maximilian Igl · Kamil Ciosek · Yingzhen Li · Sebastian Tschiatschek · Cheng Zhang · Sam Devlin · Katja Hofmann -
2019 Poster: Icebreaker: Element-wise Efficient Information Acquisition with a Bayesian Deep Latent Gaussian Model »
Wenbo Gong · Sebastian Tschiatschek · Sebastian Nowozin · Richard Turner · José Miguel Hernández-Lobato · Cheng Zhang -
2019 Poster: A Model to Search for Synthesizable Molecules »
John Bradshaw · Brooks Paige · Matt Kusner · Marwin Segler · José Miguel Hernández-Lobato -
2019 Poster: Better Exploration with Optimistic Actor Critic »
Kamil Ciosek · Quan Vuong · Robert Loftin · Katja Hofmann -
2019 Spotlight: Better Exploration with Optimistic Actor Critic »
Kamil Ciosek · Quan Vuong · Robert Loftin · Katja Hofmann -
2019 Tutorial: Reinforcement Learning: Past, Present, and Future Perspectives »
Katja Hofmann -
2018 : How Players Speak to an Intelligent Game Character Using Natural Language Messages »
Katja Hofmann -
2018 Workshop: Machine Learning for Molecules and Materials »
José Miguel Hernández-Lobato · Klaus-Robert Müller · Brooks Paige · Matt Kusner · Stefan Chmiela · Kristof Schütt -
2018 Workshop: Bayesian Deep Learning »
Yarin Gal · José Miguel Hernández-Lobato · Christos Louizos · Andrew Wilson · Zoubin Ghahramani · Kevin Murphy · Max Welling -
2018 Poster: Inference in Deep Gaussian Processes using Stochastic Gradient Hamiltonian Monte Carlo »
Marton Havasi · José Miguel Hernández-Lobato · Juan J. Murillo-Fuentes -
2017 Workshop: Bayesian Deep Learning »
Yarin Gal · José Miguel Hernández-Lobato · Christos Louizos · Andrew Wilson · Andrew Wilson · Diederik Kingma · Zoubin Ghahramani · Kevin Murphy · Max Welling -
2017 Workshop: Bayesian optimization for science and engineering »
Ruben Martinez-Cantin · José Miguel Hernández-Lobato · Javier Gonzalez -
2017 : Closing remarks »
José Miguel Hernández-Lobato -
2017 Workshop: Machine Learning for Molecules and Materials »
Kristof Schütt · Klaus-Robert Müller · Anatole von Lilienfeld · José Miguel Hernández-Lobato · Klaus-Robert Müller · Alan Aspuru-Guzik · Bharath Ramsundar · Matt Kusner · Brooks Paige · Stefan Chmiela · Alexandre Tkatchenko · Anatole von Lilienfeld · Koji Tsuda -
2017 : Panel: "How can we characterise the landscape of intelligent systems and locate human-like intelligence in it?" »
Josh Tenenbaum · Gary Marcus · Katja Hofmann -
2017 : Katja Hofmann: 'Video games and the road to collaborative AI' »
Katja Hofmann -
2017 Poster: Concrete Dropout »
Yarin Gal · Jiri Hron · Alex Kendall -
2016 : Panel Discussion »
Shakir Mohamed · David Blei · Ryan Adams · José Miguel Hernández-Lobato · Ian Goodfellow · Yarin Gal -
2016 : Automatic Chemical Design using Variational Autoencoders »
José Miguel Hernández-Lobato -
2016 : Probabilistic structure discovery in time series data »
David Janz · Brooks Paige · Thomas Rainforth · Jan-Willem van de Meent -
2016 : Alpha divergence minimization for Bayesian deep learning »
José Miguel Hernández-Lobato -
2016 Demonstration: Project Malmo - Minecraft for AI Research »
Katja Hofmann · Matthew A Johnson · Fernando Diaz · Alekh Agarwal · Tim Hutton · David Bignell · Evelyne Viegas -
2015 Poster: Stochastic Expectation Propagation »
Yingzhen Li · José Miguel Hernández-Lobato · Richard Turner -
2015 Spotlight: Stochastic Expectation Propagation »
Yingzhen Li · José Miguel Hernández-Lobato · Richard Turner -
2014 Poster: Predictive Entropy Search for Efficient Global Optimization of Black-box Functions »
José Miguel Hernández-Lobato · Matthew Hoffman · Zoubin Ghahramani -
2014 Poster: Gaussian Process Volatility Model »
Yue Wu · José Miguel Hernández-Lobato · Zoubin Ghahramani -
2014 Spotlight: Predictive Entropy Search for Efficient Global Optimization of Black-box Functions »
José Miguel Hernández-Lobato · Matthew Hoffman · Zoubin Ghahramani -
2013 Poster: Learning Feature Selection Dependencies in Multi-task Learning »
Daniel Hernández-lobato · José Miguel Hernández-Lobato -
2013 Poster: Gaussian Process Conditional Copulas with Applications to Financial Time Series »
José Miguel Hernández-Lobato · James R Lloyd · Daniel Hernández-lobato -
2012 Poster: Collaborative Gaussian Processes for Preference Learning »
Neil Houlsby · José Miguel Hernández-Lobato · Ferenc Huszar · Zoubin Ghahramani -
2012 Poster: Semi-Supervised Domain Adaptation with Non-Parametric Copulas »
David Lopez-Paz · José Miguel Hernández-Lobato · Bernhard Schölkopf -
2012 Spotlight: Semi-Supervised Domain Adaptation with Non-Parametric Copulas »
David Lopez-Paz · José Miguel Hernández-Lobato · Bernhard Schölkopf -
2011 Poster: Robust Multi-Class Gaussian Process Classification »
Daniel Hernández-lobato · José Miguel Hernández-Lobato · Pierre Dupont -
2007 Poster: Regulator Discovery from Gene Expression Time Series of Malaria Parasites: a Hierachical Approach »
José Miguel Hernández-Lobato · Tjeerd M Dijkstra · Tom Heskes