Timezone: »

Fast Convergence of Random Reshuffling under Interpolation and the Polyak-Łojasiewicz Condition
Chen Fan · Christos Thrampoulidis · Mark Schmidt
Event URL: https://openreview.net/forum?id=CzLyJCo-a7 »

Modern machine learning models are often over-parameterized and as a result they can interpolate the training data. Under such a scenario, we study the convergence properties of a sampling- without-replacement variant of Stochastic Gradient Descent (SGD), known as Random Reshuffling (RR). Unlike SGD that samples data with replacement at every iteration, RR chooses a random permutation of data at the beginning of each epoch. For under-parameterized models, it has been recently shown that RR converges faster than SGD when the number of epochs is larger than the condition number (κ) of the problem under standard assumptions like strong convexity. However, previous works do not show that RR outperforms SGD under interpolation for strongly convex objectives. Here, we show that for the class of Polyak-Łojasiewicz (PL) functions that generalizes strong convexity, RR can outperform SGD as long as the number of samples (n) is less than the parameter (ρ) of a strong growth condition (SGC).

Author Information

Chen Fan (University of British Columbia)
Christos Thrampoulidis (University of British Columbia)
Mark Schmidt (University of British Columbia)

More from the Same Authors