Timezone: »
Stochastic gradient descent (SGD) and variants such as Adagrad and Adam, are extensively used today to train modern machine learning models. In this talk we will discuss ways to economically use second-order information to modify both the step size (learning rate) used in SGD and the direction taken by SGD. Our methods adaptively control the batch sizes used to compute gradient and Hessian approximations and and ensure that the steps that are taken decrease the loss function with high probability assuming that the latter is self-concordant, as is true for many problems in empirical risk minimization. For such cases we prove that our basic algorithm is globally linearly convergent. A slightly modified version of our method is presented for training deep learning models. Numerical results will be presented that show that it exhibits excellent performance without the need for learning rate tuning. If there is time, additional ways to efficiently make use of second-order information will be presented.
Author Information
Donald Goldfarb (Columbia University)
More from the Same Authors
-
2021 Spotlight: Tensor Normal Training for Deep Learning Models »
Yi Ren · Donald Goldfarb -
2022 : Efficient Second-Order Stochastic Methods for Machine Learning »
Donald Goldfarb -
2021 Poster: Tensor Normal Training for Deep Learning Models »
Yi Ren · Donald Goldfarb -
2020 : Invited speaker: Practical Kronecker-factored BFGS and L-BFGS methods for training deep neural networks, Donald Goldfarb »
Donald Goldfarb -
2020 Poster: Practical Quasi-Newton Methods for Training Deep Neural Networks »
Donald Goldfarb · Yi Ren · Achraf Bahamou -
2020 Spotlight: Practical Quasi-Newton Methods for Training Deep Neural Networks »
Donald Goldfarb · Yi Ren · Achraf Bahamou -
2019 Poster: Leader Stochastic Gradient Descent for Distributed Training of Deep Learning Models »
Yunfei Teng · Wenbo Gao · François Chalus · Anna Choromanska · Donald Goldfarb · Adrian Weller -
2010 Poster: Sparse Inverse Covariance Selection via Alternating Linearization Methods »
Katya Scheinberg · Shiqian Ma · Donald Goldfarb