Skip to yearly menu bar Skip to main content


Poster

A Closer Look at Deep Learning Phenomena Through A Telescoping Lens

Alan Jeffares · Alicia Curth · Mihaela van der Schaar


Abstract:

The remarkable recent progress in deep learning has been fueled by the development of a multifaceted understanding of neural networks through several complementary viewpoints. In this work, we further contribute to this effort by examining a tractable and accurate model of a neural network consisting of a sequence of first-order approximations telescoping out into a single, empirically operational tool for practical analysis. We demonstrate that this model presents a pedagogical formalism allowing us to isolate components of the training process even in complex contemporary settings, providing a sharp lens to reason about the effects of design choices such as architecture and optimization strategy, and reveals surprising parallels between neural network learning and gradient boosting. We then illustrate how it can be applied to derive new empirical insights on a diverse range of prominent phenomena in the literature -- including double descent, grokking, linear mode connectivity and the challenges of applying deep learning on tabular data.

Live content is unavailable. Log in and register to view live content