Poster
Learning in the Presence of Low-dimensional Structure: A Spiked Random Matrix Perspective
Jimmy Ba · Murat Erdogdu · Taiji Suzuki · Zhichao Wang · Denny Wu
Great Hall & Hall B1+B2 (level 1) #817
Abstract:
We consider the learning of a single-index target function under spiked covariance data: f_*(\boldsymbol{x}) = \textstyle\sigma_*(\frac{1}{\sqrt{1+\theta}}\langle\boldsymbol{x},\boldsymbol{\mu}\rangle), ~~ \boldsymbol{x}\overset{\small\mathrm{i.i.d.}}{\sim}\mathcal{N}(0,\boldsymbol{I_d} + \theta\boldsymbol{\mu}\boldsymbol{\mu}^\top), ~~ \theta\asymp d^{\beta} \text{ for } \beta\in[0,1), where the link function is a degree- polynomial with information exponent (defined as the lowest degree in the Hermite expansion of ), and it depends on the projection of input onto the spike (signal) direction . In the proportional asymptotic limit where the number of training examples and the dimensionality jointly diverge: , we ask the following question: how large should the spike magnitude (i.e., the strength of the low-dimensional component) be, in order for kernel methods, neural networks optimized by gradient descent, to learn ? We show that for kernel ridge regression, is both sufficient and necessary. Whereas for two-layer neural networks trained with gradient descent, suffices. Our results demonstrate that both kernel methods and neural networks benefit from low-dimensional structures in the data. Further, since by definition, neural networks can adapt to such structures more effectively.
Chat is not available.