Skip to yearly menu bar Skip to main content


Poster

Random Function Descent

Felix Benning · Leif Döring

[ ]
Thu 12 Dec 11 a.m. PST — 2 p.m. PST

Abstract:

Classical worst-case optimization theory neither explains the successof optimization in machine learning, nor does it help with step sizeselection. We establish a connection between Bayesian Optimization (i.e.average case optimization theory) and classical optimization using a'stochastic Taylor approximation' to rediscover gradient descent. Thisrediscovery yields a step size schedule we call Random Function Descent(RFD), which, in contrast to classical derivations, is scale invariant.Furthermore, our analysis of RFD step sizes yields a theoretical foundationfor common step size heuristics such as gradient clipping and graduallearning rate warmup. We finally propose a statistical procedure forestimating the RFD step size schedule and validate this theory with a casestudy on the MNIST dataset.

Live content is unavailable. Log in and register to view live content