Timezone: »
The excellent real-world performance of deep neural networks has received increasing attention. Despite the capacity to overfit significantly, such large models work better than smaller ones. This phenomenon is often referred to as the scaling law by practitioners. It is of fundamental interest to study why the scaling law exists and how it avoids/controls overfitting. One approach has been looking at infinite width limits of neural networks (e.g., Neural Tangent Kernels, Gaussian Processes); however, in practise, these do not fully explain finite networks as their infinite counterparts do not learn features. Furthermore, the empirical kernel for finite networks (i.e., the inner product of feature vectors), changes significantly during training in contrast to infinite width networks. In this work we derive a iterative linearised training method. We justify iterative lineralisation as an interpolation between finite analogs of the infinite width regime, which do not learn features, and standard gradient descent training which does. We show some preliminary results where iterative linearised training works well, noting in particular how much feature learning is required to achieve comparable performance. We also provide novel insights into the training behaviour of neural networks.
Author Information
Adrian Goldwaser (University of Cambridge)
Hong Ge (University of Cambridge)
More from the Same Authors
-
2022 : Poster Session 1 »
Andrew Lowy · Thomas Bonnier · Yiling Xie · Guy Kornowski · Simon Schug · Seungyub Han · Nicolas Loizou · xinwei zhang · Laurent Condat · Tabea E. Röber · Si Yi Meng · Marco Mondelli · Runlong Zhou · Eshaan Nichani · Adrian Goldwaser · Rudrajit Das · Kayhan Behdin · Atish Agarwala · Mukul Gagrani · Gary Cheng · Tian Li · Haoran Sun · Hossein Taheri · Allen Liu · Siqi Zhang · Dmitrii Avdiukhin · Bradley Brown · Miaolan Xie · Junhyung Lyle Kim · Sharan Vaswani · Xinmeng Huang · Ganesh Ramachandra Kini · Angela Yuan · Weiqiang Zheng · Jiajin Li -
2019 Poster: Bayesian Learning of Sum-Product Networks »
Martin Trapp · Robert Peharz · Hong Ge · Franz Pernkopf · Zoubin Ghahramani -
2017 : Poster Session »
Shunsuke Horii · Heejin Jeong · Tobias Schwedes · Qing He · Ben Calderhead · Ertunc Erdil · Jaan Altosaar · Patrick Muchmore · Rajiv Khanna · Ian Gemp · Pengfei Zhang · Yuan Zhou · Chris Cremer · Maria DeYoreo · Alexander Terenin · Brendan McVeigh · Rachit Singh · Yaodong Yang · Erik Bodin · Trefor Evans · Henry Chai · Shandian Zhe · Jeffrey Ling · Vincent ADAM · Lars Maaløe · Andrew Miller · Ari Pakman · Josip Djolonga · Hong Ge -
2015 Poster: Particle Gibbs for Infinite Hidden Markov Models »
Nilesh Tripuraneni · Shixiang (Shane) Gu · Hong Ge · Zoubin Ghahramani