Timezone: »

Parameter Free Dual Averaging: Optimizing Lipschitz Functions in a Single Pass
Aaron Defazio · Konstantin Mishchenko

Both gradient descent and dual averaging for convex Lipschitz functionshave convergence rates that are highly dependent on the choice of learningrate. Even when the Lipschitz constant is known, setting the learning rate to achieve the optimal convergence rate requires knowing a bound on the distance from the initial point to the solution set $D$. A numberof approaches are known that relax this requirement, but they eitherrequire line searches, restarting (hyper-parameter grid search), or do not derivefrom the gradient descent or dual averaging frameworks (coin-betting).In this work we describe a single pass method, with no back-tracking or line searches, derived from dual averaging,which does not require knowledge of $D$ yet asymptotically achievesthe optimal rate of convergence for the complexity class of ConvexLipschitz functions.