Timezone: »
A recently proposed formulation of the stochastic planning and control problem as one of parameter estimation for suitable artificial statistical models has led to the adoption of inference algorithms for this notoriously hard problem. At the algorithmic level, the focus has been on developing Expectation-Maximization (EM) algorithms. In this paper, we begin by making the crucial observation that the stochastic control problem can be reinterpreted as one of trans-dimensional inference. With this new understanding, we are able to propose a novel reversible jump Markov chain Monte Carlo (MCMC) algorithm that is more efficient than its EM counterparts. Moreover, it enables us to carry out full Bayesian policy search, without the need for gradients and with one single Markov chain. The new approach involves sampling directly from a distribution that is proportional to the reward and, consequently, performs better than classic simulations methods in situations where the reward is a rare event.
Author Information
Matthew Hoffman (DeepMind)
Arnaud Doucet (Oxford)
Nando de Freitas (University of Oxford)
Ajay Jasra (Imperial College London)
Related Events (a corresponding poster, oral, or spotlight)
-
2007 Poster: Bayesian Policy Learning with Trans-Dimensional MCMC »
Mon. Dec 3rd 06:30 -- 06:40 PM Room
More from the Same Authors
-
2022 : Spectral Diffusion Processes »
Angus Phillips · Thomas Seror · Michael Hutchinson · Valentin De Bortoli · Arnaud Doucet · Emile Mathieu -
2022 Spotlight: Lightning Talks 1A-4 »
Siwei Wang · Jing Liu · Nianqiao Ju · Shiqian Li · Eloïse Berthier · Muhammad Faaiz Taufiq · Arsene Fansi Tchango · Chen Liang · Chulin Xie · Jordan Awan · Jean-Francois Ton · Ziad Kobeissi · Wenguan Wang · Xinwang Liu · Kewen Wu · Rishab Goel · Jiaxu Miao · Suyuan Liu · Julien Martel · Ruobin Gong · Francis Bach · Chi Zhang · Rob Cornish · Sanmi Koyejo · Zhi Wen · Yee Whye Teh · Yi Yang · Jiaqi Jin · Bo Li · Yixin Zhu · Vinayak Rao · Wenxuan Tu · Gaetan Marceau Caron · Arnaud Doucet · Xinzhong Zhu · Joumana Ghosn · En Zhu -
2022 Spotlight: Conformal Off-Policy Prediction in Contextual Bandits »
Muhammad Faaiz Taufiq · Jean-Francois Ton · Rob Cornish · Yee Whye Teh · Arnaud Doucet -
2022 Poster: Conformal Off-Policy Prediction in Contextual Bandits »
Muhammad Faaiz Taufiq · Jean-Francois Ton · Rob Cornish · Yee Whye Teh · Arnaud Doucet -
2022 Poster: A Continuous Time Framework for Discrete Denoising Models »
Andrew Campbell · Joe Benton · Valentin De Bortoli · Thomas Rainforth · George Deligiannidis · Arnaud Doucet -
2022 Poster: Score-Based Diffusion meets Annealed Importance Sampling »
Arnaud Doucet · Will Grathwohl · Alexander Matthews · Heiko Strathmann -
2022 Poster: A Multi-Resolution Framework for U-Nets with Applications to Hierarchical VAEs »
Fabian Falck · Christopher Williams · Dominic Danks · George Deligiannidis · Christopher Yau · Chris C Holmes · Arnaud Doucet · Matthew Willetts -
2022 Poster: Riemannian Score-Based Generative Modelling »
Valentin De Bortoli · Emile Mathieu · Michael Hutchinson · James Thornton · Yee Whye Teh · Arnaud Doucet -
2022 Poster: Towards Learning Universal Hyperparameter Optimizers with Transformers »
Yutian Chen · Xingyou Song · Chansoo Lee · Zi Wang · Richard Zhang · David Dohan · Kazuya Kawakami · Greg Kochanski · Arnaud Doucet · Marc'Aurelio Ranzato · Sagi Perel · Nando de Freitas -
2021 Test Of Time: Online Learning for Latent Dirichlet Allocation »
Matthew Hoffman · Francis Bach · David Blei -
2020 Poster: Modular Meta-Learning with Shrinkage »
Yutian Chen · Abram Friesen · Feryal Behbahani · Arnaud Doucet · David Budden · Matthew Hoffman · Nando de Freitas -
2020 Spotlight: Modular Meta-Learning with Shrinkage »
Yutian Chen · Abram Friesen · Feryal Behbahani · Arnaud Doucet · David Budden · Matthew Hoffman · Nando de Freitas -
2020 Poster: RL Unplugged: A Suite of Benchmarks for Offline Reinforcement Learning »
Caglar Gulcehre · Ziyu Wang · Alexander Novikov · Thomas Paine · Sergio Gómez · Konrad Zolna · Rishabh Agarwal · Josh Merel · Daniel Mankowitz · Cosmin Paduraru · Gabriel Dulac-Arnold · Jerry Li · Mohammad Norouzi · Matthew Hoffman · Nicolas Heess · Nando de Freitas -
2019 Poster: Augmented Neural ODEs »
Emilien Dupont · Arnaud Doucet · Yee Whye Teh -
2018 Poster: Hamiltonian Variational Auto-Encoder »
Anthony Caterini · Arnaud Doucet · Dino Sejdinovic -
2017 Poster: Filtering Variational Objectives »
Chris Maddison · John Lawson · George Tucker · Nicolas Heess · Mohammad Norouzi · Andriy Mnih · Arnaud Doucet · Yee Teh -
2017 Poster: Clone MCMC: Parallel High-Dimensional Gaussian Gibbs Sampling »
Andrei-Cristian Barbos · Francois Caron · Jean-François Giovannelli · Arnaud Doucet -
2016 Poster: Learning to learn by gradient descent by gradient descent »
Marcin Andrychowicz · Misha Denil · Sergio Gómez · Matthew Hoffman · David Pfau · Tom Schaul · Nando de Freitas -
2016 Poster: Learning to Communicate with Deep Multi-Agent Reinforcement Learning »
Jakob Foerster · Yannis Assael · Nando de Freitas · Shimon Whiteson -
2015 : Information based methods for Black-box Optimization »
Matthew Hoffman -
2015 Workshop: Scalable Monte Carlo Methods for Bayesian Analysis of Big Data »
Babak Shahbaba · Yee Whye Teh · Max Welling · Arnaud Doucet · Christophe Andrieu · Sebastian J. Vollmer · Pierre Jacob -
2015 Poster: Expectation Particle Belief Propagation »
Thibaut Lienart · Yee Whye Teh · Arnaud Doucet -
2014 Workshop: Bayesian Optimization in Academia and Industry »
Zoubin Ghahramani · Ryan Adams · Matthew Hoffman · Kevin Swersky · Jasper Snoek -
2014 Poster: Asynchronous Anytime Sequential Monte Carlo »
Brooks Paige · Frank Wood · Arnaud Doucet · Yee Whye Teh -
2014 Oral: Asynchronous Anytime Sequential Monte Carlo »
Brooks Paige · Frank Wood · Arnaud Doucet · Yee Whye Teh -
2014 Poster: Predictive Entropy Search for Efficient Global Optimization of Black-box Functions »
José Miguel Hernández-Lobato · Matthew Hoffman · Zoubin Ghahramani -
2014 Spotlight: Predictive Entropy Search for Efficient Global Optimization of Black-box Functions »
José Miguel Hernández-Lobato · Matthew Hoffman · Zoubin Ghahramani -
2014 Poster: Distributed Parameter Estimation in Probabilistic Graphical Models »
Yariv D Mizrahi · Misha Denil · Nando de Freitas -
2013 Workshop: Bayesian Optimization in Theory and Practice »
Matthew Hoffman · Jasper Snoek · Nando de Freitas · Michael A Osborne · Ryan Adams · Sebastien Bubeck · Philipp Hennig · Remi Munos · Andreas Krause -
2013 Workshop: Deep Learning »
Yoshua Bengio · Hugo Larochelle · Russ Salakhutdinov · Tomas Mikolov · Matthew D Zeiler · David Mcallester · Nando de Freitas · Josh Tenenbaum · Jian Zhou · Volodymyr Mnih -
2011 Workshop: Bayesian optimization, experimental design and bandits: Theory and applications »
Nando de Freitas · Roman Garnett · Frank R Hutter · Michael A Osborne -
2010 Session: Spotlights Session 10 »
Nando de Freitas -
2010 Session: Oral Session 12 »
Nando de Freitas -
2009 Workshop: Adaptive Sensing, Active Learning, and Experimental Design »
Rui M Castro · Nando de Freitas · Ruben Martinez-Cantin -
2009 Poster: Bayesian Nonparametric Models on Decomposable Graphs »
Francois Caron · Arnaud Doucet -
2009 Tutorial: Sequential Monte-Carlo Methods »
Arnaud Doucet · Nando de Freitas -
2008 Poster: An interior-point stochastic approximation method and an L1-regularized delta rule »
Peter Carbonetto · Mark Schmidt · Nando de Freitas -
2008 Oral: An interior-point stochastic approximation method and an L1-regularized delta rule »
Peter Carbonetto · Mark Schmidt · Nando de Freitas -
2008 Demonstration: Worio: A Web-Scale Machine Learning System »
Nando de Freitas · Ali Davar -
2007 Poster: Active Preference Learning with Discrete Choice Data »
Eric Brochu · Nando de Freitas · Abhijeet Ghosh -
2006 Poster: Conditional mean field »
Peter Carbonetto · Nando de Freitas