Timezone: »
We propose a novel Bayesian approach to solve stochastic optimization problems that involve finding extrema of noisy, nonlinear functions. Previous work has focused on representing possible functions explicitly, which leads to a two-step procedure of first, doing inference over the function space and second, finding the extrema of these functions. Here we skip the representation step and directly model the distribution over extrema. To this end, we devise a non-parametric conjugate prior where the natural parameter corresponds to a given kernel function and the sufficient statistic is composed of the observed function values. The resulting posterior distribution directly captures the uncertainty over the maximum of the unknown function.
Author Information
Pedro Ortega (DeepMind)
Tim Genewein (Max-Planck Institute)
Jordi Grau-Moya (Max Planck Institute)
David Balduzzi (Victoria University Wellington)
Daniel A Braun (University of Cambridge)
More from the Same Authors
-
2020 Poster: Meta-trained agents implement Bayes-optimal agents »
Vladimir Mikulik · Grégoire Delétang · Tom McGrath · Tim Genewein · Miljan Martic · Shane Legg · Pedro Ortega -
2020 Spotlight: Meta-trained agents implement Bayes-optimal agents »
Vladimir Mikulik · Grégoire Delétang · Tom McGrath · Tim Genewein · Miljan Martic · Shane Legg · Pedro Ortega -
2016 : Agency and Causality in Decision Making »
Pedro Ortega -
2016 Poster: Human Decision-Making under Limited Time »
Pedro Ortega · Alan A Stocker -
2014 Workshop: Novel Trends and Applications in Reinforcement Learning »
Csaba Szepesvari · Marc Deisenroth · Sergey Levine · Pedro Ortega · Brian Ziebart · Emma Brunskill · Naftali Tishby · Gerhard Neumann · Daniel Lee · Sridhar Mahadevan · Pieter Abbeel · David Silver · Vicenç Gómez -
2013 Workshop: Planning with Information Constraints for Control, Reinforcement Learning, Computational Neuroscience, Robotics and Games. »
Hilbert J Kappen · Naftali Tishby · Jan Peters · Evangelos Theodorou · David H Wolpert · Pedro Ortega -
2013 Poster: Correlated random features for fast semi-supervised learning »
Brian McWilliams · David Balduzzi · Joachim M Buhmann -
2012 Poster: Towards a learning-theoretic analysis of spike-timing dependent plasticity »
David Balduzzi · Michel Besserve