Timezone: »

Batch size selection by stochastic optimal contro
Jim Zhao · Aurelien Lucchi · Frank Proske · Antonio Orvieto · Hans Kersting

SGD and its variants are widespread in the field of machine learning. Although there is extensive research on the choice of step-size schedules to guarantee convergence of these methods, there is substantially less work examining the influence of the batch size on optimization. The latter is typically kept constant and chosen via experimental validation.\\ In this work we take a stochastic optimal control perspective to understand the effect of the batch size when optimizing non-convex functions with SGD. Specifically, we define an optimal control problem, which considers the \emph{entire} trajectory of SGD to choose the optimal batch size for a noisy quadratic model. We show that the batch size is inherently coupled with the step size and that for saddles there is a transition-time $t^*$, after which it is beneficial to increase the batch size to reduce the covariance of the stochastic gradients. We verify our results empirically on various convex and non-convex problems.

#### Author Information

##### Antonio Orvieto (ETH Zurich)

PhD Student at ETH Zurich. I’m interested in the design and analysis of optimization algorithms for deep learning. Interned at DeepMind, MILA, and Meta. All publications at http://orvi.altervista.org/ Looking for postdoc positions! :) antonio.orvieto@inf.ethz.ch

##### Hans Kersting (INRIA)

I am a postdoctoral researcher at the Sierra team at INRIA Paris, advised by Francis Bach. My research focuses on probabilistic methods for machine learning, especially in the context of dynamical systems and optimization.