Implicit Bias of Polyak and Line-Search Step Sizes on Linear Classification with Separable Data
Chen Fan · Reza Babanezhad Harikandeh · Christos Thrampoulidis · Mark Schmidt · Sharan Vaswani
Abstract
Recent works have shown that Polyak and line-search step sizes are good for training deep neu-ral nets. However, a theoretical understanding of their generalization performances is lacking.For overparameterized models, multiple solutions can generalize differently to unseen data despiteall obtaining zero train error. Given this, a natural question is whether an algorithm inherentlyprefers (without explicit regularization) certain simple solutions over others upon convergence-aphenomenon known as implicit bias/regularization. In this work, we characterize the implicit biasof gradient descent with Polyak and line-search step sizes in linear classification with the logis-tic or cross-entropy loss. Given these step sizes are adaptive to local smoothness of the loss, weprove that the margin of their iterates converges to the maximum $l_2$-norm margin at $\tilde{O}(\frac{1}{T})$ rate. In contrast to other adaptive step sizes that achieve the same rate [7] (also known as normalizedgradient descent-NGD), line-search and Polyak step sizes do not depend on problem-specific con-stants that may not be accessible. Another subtle issue is that NGD can diverge on common losseswith non-separable data, whereas line-search converges given it guarantees descent on the functionvalue at each iteration. Finally, our analysis extends the game framework of Wang et al. [26] tologistic/cross-entropy losses.
Chat is not available.
Successful Page Load