Oral
Annealing between distributions by averaging moments
Roger Grosse · Chris Maddison · Russ Salakhutdinov

Fri Dec 6th 04:40 -- 05:00 PM @ Harvey's Convention Center Floor, CC

Many powerful Monte Carlo techniques for estimating partition functions, such as annealed importance sampling (AIS), are based on sampling from a sequence of intermediate distributions which interpolate between a tractable initial distribution and an intractable target distribution. The near-universal practice is to use geometric averages of the initial and target distributions, but alternative paths can perform substantially better. We present a novel sequence of intermediate distributions for exponential families: averaging the moments of the initial and target distributions. We derive an asymptotically optimal piecewise linear schedule for the moments path and show that it performs at least as well as geometric averages with a linear schedule. Moment averaging performs well empirically at estimating partition functions of restricted Boltzmann machines (RBMs), which form the building blocks of many deep learning models, including Deep Belief Networks and Deep Boltzmann Machines.

Author Information

Roger Grosse (University of Toronto)
Chris Maddison (University of Oxford / DeepMind)
Russ Salakhutdinov (Carnegie Mellon University)

More from the Same Authors