Invited Talk: Roger Grosse - Why Isn’t Everyone Using Second-Order Optimization?
Roger Grosse
2020 Talk
in
Workshop: I Can’t Believe It’s Not Better! Bridging the gap between theory and empiricism in probabilistic machine learning
in
Workshop: I Can’t Believe It’s Not Better! Bridging the gap between theory and empiricism in probabilistic machine learning
Abstract
In the pre-AlexNet days of deep learning, second-order optimization gave dramatic speedups and enabled training of deep architectures that seemed to be inaccessible to first-order optimization. But today, despite algorithmic advances such as K-FAC, nearly all modern neural net architectures are trained with variants of SGD and Adam. What’s holding us back from using second-order optimization? I’ll discuss three challenges to applying second-order optimization to modern neural nets: difficulty of implementation, implicit regularization effects of gradient descent, and the effect of gradient noise. All of these factors are significant, though not in the ways commonly believed.
Chat is not available.
Successful Page Load