Peter Grünwald - A Tight Excess Risk Bound via a Unified PAC-Bayesian-Rademacher-Shtarkov-MDL Complexity
Peter Grünwald
2017 Talk
in
Workshop: (Almost) 50 shades of Bayesian Learning: PAC-Bayesian trends and insights
in
Workshop: (Almost) 50 shades of Bayesian Learning: PAC-Bayesian trends and insights
Abstract
Over the last 15 years, machine learning theorists have bounded the performance of empirical risk minimization by (localized) Rademacher complexity; Bayesians with frequentist sympathies have studied Bayesian consistency and rate of convergence theorems, sometimes under misspecification, and PAC-Bayesians have studied convergence properties of generalized Bayesian and Gibbs posteriors. We show that, amazingly, most such bounds readily follow from essentially a single result that bounds excess risk in terms of a novel complexity COMP$(\eta,w)$. which depends on a learning rate $\eta$ and a luckiness function $w$, the latter generalizing the concept of a 'prior'. Depending on the choice of $w$, COMP$(\eta,w)$ specializes to PAC-Bayesian (KL(posterior||prior) complexity, MDL (normalized maximum likelihood) complexity and Rademacher complexity, and the bounds obtained are optimized for generalized Bayes, ERM, penalized ERM (such as Lasso) or other methods. Tuning $\eta$ leads to optimal excess risk convergence rates, even for very large (polynomial entropy) classes which have always been problematic for the PAC-Bayesian approach; the optimal $\eta$ depends on 'fast rate' properties of the domain, such as central, Bernstein and Tsybakov conditions.
Joint work with Nishant Mehta, University of Victoria. See https://arxiv.org/abs/1710.07732
Chat is not available.
Successful Page Load