Timezone: »
Poster
Precise Learning Curves and Higher-Order Scalings for Dot-product Kernel Regression
Lechao Xiao · Hong Hu · Theodor Misiakiewicz · Yue Lu · Jeffrey Pennington
As modern machine learning models continue to advance the computational frontier, it has become increasingly important to develop precise estimates for expected performance improvements under different model and data scaling regimes. Currently, theoretical understanding of the learning curves that characterize how the prediction error depends on the number of samples is restricted to either large-sample asymptotics ($m\to\infty$) or, for certain simple data distributions, to the high-dimensional asymptotics in which the number of samples scales linearly with the dimension ($m\propto d$). There is a wide gulf between these two regimes, including all higher-order scaling relations $m\propto d^r$, which are the subject of the present paper. We focus on the problem of kernel ridge regression for dot-product kernels and present precise formulas for the mean of the test error, bias, and variance, for data drawn uniformly from the sphere with isotropic random labels in the $r$th-order asymptotic scaling regime $m\to\infty$ with $m/d^r$ held constant. We observe a peak in the learning curve whenever $m \approx d^r/r!$ for any integer $r$, leading to multiple sample-wise descent and nontrivial behavior at multiple scales. We include a colab notebook that reproduces the essential results of the paper.
Author Information
Lechao Xiao (Google Brain)
Lechao is a research scientist in the Brain team in Google Research, where he is working on machine learning and deep learning. Prior to Google Brain, he was a Hans Rademacher Instructor of Mathematics at the University of Pennsylvania, where he was working on harmonic analysis. He earned his PhD in mathematics from the University of Illinois at Urbana-Champaign and his BA in pure and applied math from Zhejiang University, Hangzhou, China. Lechao research interests include theory of machine learning and deep learning, optimization, Gaussian process, generalization, etc.
Hong Hu (Harvard)
Theodor Misiakiewicz (Stanford University)
Yue Lu (Harvard University)
Jeffrey Pennington (Google Brain)
More from the Same Authors
-
2022 : A Second-order Regression Model Shows Edge of Stability Behavior »
Fabian Pedregosa · Atish Agarwala · Jeffrey Pennington -
2022 Poster: Implicit Regularization or Implicit Conditioning? Exact Risk Trajectories of SGD in High Dimensions »
Courtney Paquette · Elliot Paquette · Ben Adlam · Jeffrey Pennington -
2022 Poster: Learning with convolution and pooling operations in kernel methods »
Theodor Misiakiewicz · Song Mei -
2022 Poster: Fast Neural Kernel Embeddings for General Activations »
Insu Han · Amir Zandieh · Jaehoon Lee · Roman Novak · Lechao Xiao · Amin Karbasi -
2021 Poster: Overparameterization Improves Robustness to Covariate Shift in High Dimensions »
Nilesh Tripuraneni · Ben Adlam · Jeffrey Pennington -
2020 Poster: Finite Versus Infinite Neural Networks: an Empirical Study »
Jaehoon Lee · Samuel Schoenholz · Jeffrey Pennington · Ben Adlam · Lechao Xiao · Roman Novak · Jascha Sohl-Dickstein -
2020 Spotlight: Finite Versus Infinite Neural Networks: an Empirical Study »
Jaehoon Lee · Samuel Schoenholz · Jeffrey Pennington · Ben Adlam · Lechao Xiao · Roman Novak · Jascha Sohl-Dickstein -
2020 Poster: Generalization error in high-dimensional perceptrons: Approaching Bayes error with convex optimization »
Benjamin Aubin · Florent Krzakala · Yue Lu · Lenka Zdeborová -
2020 Poster: The Surprising Simplicity of the Early-Time Learning Dynamics of Neural Networks »
Wei Hu · Lechao Xiao · Ben Adlam · Jeffrey Pennington -
2020 Spotlight: The Surprising Simplicity of the Early-Time Learning Dynamics of Neural Networks »
Wei Hu · Lechao Xiao · Ben Adlam · Jeffrey Pennington -
2020 Poster: When Do Neural Networks Outperform Kernel Methods? »
Behrooz Ghorbani · Song Mei · Theodor Misiakiewicz · Andrea Montanari -
2020 Poster: Understanding Double Descent Requires A Fine-Grained Bias-Variance Decomposition »
Ben Adlam · Jeffrey Pennington -
2019 Poster: Wide Neural Networks of Any Depth Evolve as Linear Models Under Gradient Descent »
Jaehoon Lee · Lechao Xiao · Samuel Schoenholz · Yasaman Bahri · Roman Novak · Jascha Sohl-Dickstein · Jeffrey Pennington -
2019 Poster: A Solvable High-Dimensional Model of GAN »
Chuang Wang · Hong Hu · Yue Lu -
2019 Poster: Limitations of Lazy Training of Two-layers Neural Network »
Behrooz Ghorbani · Song Mei · Theodor Misiakiewicz · Andrea Montanari -
2019 Spotlight: Limitations of Lazy Training of Two-layers Neural Network »
Behrooz Ghorbani · Song Mei · Theodor Misiakiewicz · Andrea Montanari -
2018 Poster: The Spectrum of the Fisher Information Matrix of a Single-Hidden-Layer Neural Network »
Jeffrey Pennington · Pratik Worah -
2017 Spotlight: Nonlinear random matrix theory for deep learning »
Jeffrey Pennington · Pratik Worah -
2017 Poster: Nonlinear random matrix theory for deep learning »
Jeffrey Pennington · Pratik Worah -
2017 Poster: The Scaling Limit of High-Dimensional Online Independent Component Analysis »
Chuang Wang · Yue Lu -
2017 Spotlight: The Scaling Limit of High-Dimensional Online Independent Component Analysis »
Chuang Wang · Yue Lu -
2017 Poster: Resurrecting the sigmoid in deep learning through dynamical isometry: theory and practice »
Jeffrey Pennington · Samuel Schoenholz · Surya Ganguli -
2015 Poster: Spherical Random Features for Polynomial Kernels »
Jeffrey Pennington · Felix Yu · Sanjiv Kumar -
2015 Spotlight: Spherical Random Features for Polynomial Kernels »
Jeffrey Pennington · Felix Yu · Sanjiv Kumar