Tutorial
Scalable Bayesian Inference
David Dunson
Room 517 CD
This tutorial will provide a practical overview of state-of-the-art approaches for analyzing massive data sets using Bayesian statistical methods. The first focus area will be on algorithms for very large sample size data (large n), and the second focus area will be on approaches for very high-dimensional data (large p). A particular emphasis will be on maintaining a valid characterization of uncertainty, ruling out many popular methods, such as (most) variational approximations and approaches for maximum a posteriori estimation. I will briefly review classical large sample approximations to posterior distributions (e.g., Laplace’s method, Bayesian central limit theorem), and will then transition to discussing conceptually and practical simple approaches for scaling up commonly used Markov chain Monte Carlo (MCMC) algorithms. The focus is on making posterior computation much faster to implement for huge datasets while maintaining accuracy guarantees. Some useful classes of algorithms having increasing theoretical and practical support include embarrassingly parallel (EP) MCMC, approximate MCMC, stochastic approximation, hybrid optimization and sampling, and modularization. Applications to computational advertising, genomics, neurosciences and other areas will provide a concrete motivation. Code and notes will be made available, and research problems of ongoing interest highlighted.