Spotlight
Bayesian Policy Learning with Trans-Dimensional MCMC
Matthew Hoffman · Arnaud Doucet · Nando de Freitas · Ajay Jasra

Mon Dec 3rd 08:10 -- 08:25 PM @ None

A recently proposed formulation of the stochastic planning and control problem as one of parameter estimation for suitable artificial statistical models has led to the adoption of inference algorithms for this notoriously hard problem. At the algorithmic level, the focus has been on developing Expectation-Maximization (EM) algorithms. In this paper, we begin by making the crucial observation that the stochastic control problem can be reinterpreted as one of trans-dimensional inference. With this new understanding, we are able to propose a novel reversible jump Markov chain Monte Carlo (MCMC) algorithm that is more efficient than its EM counterparts. Moreover, it enables us to carry out full Bayesian policy search, without the need for gradients and with one single Markov chain. The new approach involves sampling directly from a distribution that is proportional to the reward and, consequently, performs better than classic simulations methods in situations where the reward is a rare event.

Author Information

Matthew Hoffman (DeepMind)
Arnaud Doucet (Oxford)
Nando de Freitas (University of Oxford)
Ajay Jasra (Imperial College London)

More from the Same Authors