Timezone: »
Our talk focuses on training Deep Neural Networks (DNNs), which due to the enormous number of parameters current DNNs have, using the Hessian matrix or a full approximation to it in a second-order method is prohibitive, both in terms of memory requirements and computational cost per iteration. Hence, to be practical, layer-wise block-diagonal approximations to these matrices are usually used. Here we describe second-order quasi-Newton (QN), natural gradient (NG), and generalized Gauss-Newton (GGN) methods of this type that are competitive with and often outperform first-order methods. These methods include those that use layer-wise (i) Kronecker-factored BFGS and L-BFGS QN approximations, (ii) tensor normal covariance and (iii) mini-block Fisher matrix approximations, and (iv) Sherman-Morrison-Woodbury based variants of NG and GGN methods.
Author Information
Donald Goldfarb (Columbia University)
More from the Same Authors
-
2021 Spotlight: Tensor Normal Training for Deep Learning Models »
Yi Ren · Donald Goldfarb -
2021 Poster: Tensor Normal Training for Deep Learning Models »
Yi Ren · Donald Goldfarb -
2020 : Invited speaker: Practical Kronecker-factored BFGS and L-BFGS methods for training deep neural networks, Donald Goldfarb »
Donald Goldfarb -
2020 Poster: Practical Quasi-Newton Methods for Training Deep Neural Networks »
Donald Goldfarb · Yi Ren · Achraf Bahamou -
2020 Spotlight: Practical Quasi-Newton Methods for Training Deep Neural Networks »
Donald Goldfarb · Yi Ren · Achraf Bahamou -
2019 : Economical use of second-order information in training machine learning models »
Donald Goldfarb -
2019 Poster: Leader Stochastic Gradient Descent for Distributed Training of Deep Learning Models »
Yunfei Teng · Wenbo Gao · François Chalus · Anna Choromanska · Donald Goldfarb · Adrian Weller -
2010 Poster: Sparse Inverse Covariance Selection via Alternating Linearization Methods »
Katya Scheinberg · Shiqian Ma · Donald Goldfarb