Timezone: »
Poster
Learning in the Presence of Low-dimensional Structure: A Spiked Random Matrix Perspective
Jimmy Ba · Murat Erdogdu · Taiji Suzuki · Zhichao Wang · Denny Wu
We consider the learning of a single-index target function $f_*: \mathbb{R}^d\to\mathbb{R}$ under spiked covariance data: $f_*(\boldsymbol{x}) = \textstyle\sigma_*(\frac{1}{\sqrt{1+\theta}}\langle\boldsymbol{x},\boldsymbol{\mu}\rangle)$, $\boldsymbol{x}\overset{\small\mathrm{i.i.d.}}{\sim}\mathcal{N}(0,\boldsymbol{I_d} + \theta\boldsymbol{\mu}\boldsymbol{\mu}^\top),$ where the link function $\sigma_*:\mathbb{R}\to\mathbb{R}$ is a degree-$p$ polynomial with information exponent $k$ (defined as the lowest degree in the Hermite expansion of $\sigma_*$), and it depends on the projection of input $\boldsymbol{x}$ onto the spike (signal) direction $\boldsymbol{\mu}\in\mathbb{R}^d$. In the proportional asymptotic limit where the number of training examples $n$ and the dimensionality $d$ jointly diverge: $n,d\to\infty, d/n\to\gamma\in(0,\infty)$, we ask the following question: how large should the spike magnitude $\theta$ (i.e., strength of the low-dimensional component) be, in order for $(i)$ kernel methods, $(ii)$ neural network trained with gradient descent, to learn $f_*$? We show that for kernel ridge regression, $\theta = \Omega\big(d^{1-\frac{1}{p}}\big)$ is both sufficient and necessary. Whereas for GD-trained two-layer neural network, $\theta=\Omega\big(d^{1-\frac{1}{k}}\big)$ suffices. Our result demonstrates that both kernel method and neural network benefit from low-dimensional structures in the data; moreover, since $k\le p$ by definition, neural network can adapt to such structure more effectively.
Author Information
Jimmy Ba (University of Toronto / Vector Institute)
Murat Erdogdu (University of Toronto)
Taiji Suzuki (The University of Tokyo/RIKEN-AIP)
Zhichao Wang (UC San Diego)
Denny Wu (New York University & Flatiron Institute)
More from the Same Authors
-
2021 Spotlight: Deep learning is adaptive to intrinsic dimensionality of model smoothness in anisotropic Besov space »
Taiji Suzuki · Atsushi Nitanda -
2021 Spotlight: Fractal Structure and Generalization Properties of Stochastic Optimization Algorithms »
Alexander Camuto · George Deligiannidis · Murat Erdogdu · Mert Gurbuzbalaban · Umut Simsekli · Lingjiong Zhu -
2021 : BLAST: Latent Dynamics Models from Bootstrapping »
Keiran Paster · Lev McKinney · Sheila McIlraith · Jimmy Ba -
2022 Poster: Escaping Saddle Points with Bias-Variance Reduced Local Perturbed SGD for Communication Efficient Nonconvex Distributed Learning »
Tomoya Murata · Taiji Suzuki -
2022 : Neural Networks Efficiently Learn Low-Dimensional Representations with SGD »
Alireza Mousavi-Hosseini · Sejun Park · Manuela Girotti · Ioannis Mitliagkas · Murat Erdogdu -
2022 : Reducing Communication in Nonconvex Federated Learning with a Novel Single-Loop Variance Reduction Method »
Kazusato Oko · Shunta Akiyama · Tomoya Murata · Taiji Suzuki -
2022 : Large Language Models Are Human-Level Prompt Engineers »
Yongchao Zhou · Andrei Muresanu · Ziwen Han · Silviu Pitis · Harris Chan · Keiran Paster · Jimmy Ba -
2022 : Return Augmentation gives Supervised RL Temporal Compositionality »
Keiran Paster · Silviu Pitis · Sheila McIlraith · Jimmy Ba -
2022 : Temporary Goals for Exploration »
Haoyang Xu · Jimmy Ba · Silviu Pitis · Harris Chan -
2022 : Return Augmentation gives Supervised RL Temporal Compositionality »
Keiran Paster · Silviu Pitis · Sheila McIlraith · Jimmy Ba -
2022 : Guiding Exploration Towards Impactful Actions »
Vaibhav Saxena · Jimmy Ba · Danijar Hafner -
2022 : Steering Large Language Models using APE »
Yongchao Zhou · Andrei Muresanu · Ziwen Han · Keiran Paster · Silviu Pitis · Harris Chan · Jimmy Ba -
2022 : Rational Multi-Objective Agents Must Admit Non-Markov Reward Representations »
Silviu Pitis · Duncan Bailey · Jimmy Ba -
2023 : OpenWebMath: An Open Dataset of High-Quality Mathematical Web Text - Poster »
Keiran Paster · Marco Dos Santos · Zhangir Azerbayev · Jimmy Ba -
2023 : STEVE-1: A Generative Model for Text-to-Behavior in Minecraft »
Shalev Lifshitz · Keiran Paster · Harris Chan · Jimmy Ba · Sheila McIlraith -
2023 : Graph Neural Networks Benefit from Structural Information Provably: A Feature Learning Perspective »
Wei Huang · Yuan Cao · Haonan Wang · Xin Cao · Taiji Suzuki -
2023 : Symmetric Mean-field Langevin Dynamics for Distributional Minimax Problems »
Juno Kim · Kakei Yamamoto · Kazusato Oko · Zhuoran Yang · Taiji Suzuki -
2023 : How Structured Data Guides Feature Learning: A Case Study of the Parity Problem »
Atsushi Nitanda · Kazusato Oko · Taiji Suzuki · Denny Wu -
2023 : STEVE-1: A Generative Model for Text-to-Behavior in Minecraft »
Shalev Lifshitz · Keiran Paster · Harris Chan · Jimmy Ba · Sheila McIlraith -
2023 : STEVE-1: A Generative Model for Text-to-Behavior in Minecraft »
Shalev Lifshitz · Keiran Paster · Harris Chan · Jimmy Ba · Sheila McIlraith -
2023 : Identifying the Risks of LM Agents with an LM-Emulated Sandbox »
Yangjun Ruan · Honghua Dong · Andrew Wang · Silviu Pitis · Yongchao Zhou · Jimmy Ba · Yann Dubois · Chris Maddison · Tatsunori Hashimoto -
2023 : Identifying the Risks of LM Agents with an LM-Emulated Sandbox »
Yangjun Ruan · Honghua Dong · Andrew Wang · Silviu Pitis · Yongchao Zhou · Jimmy Ba · Yann Dubois · Chris Maddison · Tatsunori Hashimoto -
2023 : Using Large Language Models for Hyperparameter Optimization »
Michael Zhang · Nishkrit Desai · Juhan Bae · Jonathan Lorraine · Jimmy Ba -
2023 : Using Large Language Models for Hyperparameter Optimization »
Michael Zhang · Nishkrit Desai · Juhan Bae · Jonathan Lorraine · Jimmy Ba -
2023 Poster: Feature learning via mean-field Langevin dynamics: classifying sparse parities and beyond »
Taiji Suzuki · Denny Wu · Kazusato Oko · Atsushi Nitanda -
2023 Poster: Spectral Evolution and Invariance in Linear-width Neural Networks »
Zhichao Wang · Andrew Engel · Anand D Sarwate · Ioana Dumitriu · Tony Chiang -
2023 Poster: STEVE-1: A Generative Model for Text-to-Behavior in Minecraft »
Shalev Lifshitz · Keiran Paster · Harris Chan · Jimmy Ba · Sheila McIlraith -
2023 Poster: Distributional Model Equivalence for Risk-Sensitive Reinforcement Learning »
Tyler Kastner · Murat Erdogdu · Amir-massoud Farahmand -
2023 Poster: Gradient-Based Feature Learning under Structured Data »
Alireza Mousavi-Hosseini · Denny Wu · Taiji Suzuki · Murat Erdogdu -
2023 Poster: Optimal Excess Risk Bounds for Empirical Risk Minimization on $p$-Norm Linear Regression »
Ayoub El Hanchi · Murat Erdogdu -
2023 Poster: Mean-field Langevin dynamics: Time-space discretization, stochastic gradient, and variance reduction »
Taiji Suzuki · Denny Wu · Atsushi Nitanda -
2023 Poster: AlpacaFarm: A Simulation Framework for Methods that Learn from Human Feedback »
Yann Dubois · Chen Xuechen Li · Rohan Taori · Tianyi Zhang · Ishaan Gulrajani · Jimmy Ba · Carlos Guestrin · Percy Liang · Tatsunori Hashimoto -
2022 Spotlight: Lightning Talks 4A-2 »
Barakeel Fanseu Kamhoua · Hualin Zhang · Taiki Miyagawa · Tomoya Murata · Xin Lyu · Yan Dai · Elena Grigorescu · Zhipeng Tu · Lijun Zhang · Taiji Suzuki · Wei Jiang · Haipeng Luo · Lin Zhang · Xi Wang · Young-San Lin · Huan Xiong · Liyu Chen · Bin Gu · Jinfeng Yi · Yongqiang Chen · Sandeep Silwal · Yiguang Hong · Maoyuan 'Raymond' Song · Lei Wang · Tianbao Yang · Han Yang · MA Kaili · Samson Zhou · Deming Yuan · Bo Han · Guodong Shi · Bo Li · James Cheng -
2022 Spotlight: Escaping Saddle Points with Bias-Variance Reduced Local Perturbed SGD for Communication Efficient Nonconvex Distributed Learning »
Tomoya Murata · Taiji Suzuki -
2022 : Invited Talk by Jimmy Ba »
Jimmy Ba -
2022 Poster: High-dimensional Asymptotics of Feature Learning: How One Gradient Step Improves the Representation »
Jimmy Ba · Murat Erdogdu · Taiji Suzuki · Zhichao Wang · Denny Wu · Greg Yang -
2022 Poster: You Can’t Count on Luck: Why Decision Transformers and RvS Fail in Stochastic Environments »
Keiran Paster · Sheila McIlraith · Jimmy Ba -
2022 Poster: Two-layer neural network on infinite dimensional data: global optimization guarantee in the mean-field regime »
Naoki Nishikawa · Taiji Suzuki · Atsushi Nitanda · Denny Wu -
2022 Poster: Generalization Bounds for Stochastic Gradient Descent via Localized $\varepsilon$-Covers »
Sejun Park · Umut Simsekli · Murat Erdogdu -
2022 Poster: Dataset Distillation using Neural Feature Regression »
Yongchao Zhou · Ehsan Nezhadarya · Jimmy Ba -
2022 Poster: Improved Convergence Rate of Stochastic Gradient Langevin Dynamics with Variance Reduction and its Application to Optimization »
Yuri Kinoshita · Taiji Suzuki -
2021 Poster: Heavy Tails in SGD and Compressibility of Overparametrized Neural Networks »
Melih Barsbey · Milad Sefidgaran · Murat Erdogdu · Gaël Richard · Umut Simsekli -
2021 Poster: Clockwork Variational Autoencoders »
Vaibhav Saxena · Jimmy Ba · Danijar Hafner -
2021 Poster: Learning Domain Invariant Representations in Goal-conditioned Block MDPs »
Beining Han · Chongyi Zheng · Harris Chan · Keiran Paster · Michael Zhang · Jimmy Ba -
2021 Poster: Differentiable Multiple Shooting Layers »
Stefano Massaroli · Michael Poli · Sho Sonoda · Taiji Suzuki · Jinkyoo Park · Atsushi Yamashita · Hajime Asama -
2021 Poster: An Analysis of Constant Step Size SGD in the Non-convex Regime: Asymptotic Normality and Bias »
Lu Yu · Krishnakumar Balasubramanian · Stanislav Volgushev · Murat Erdogdu -
2021 Poster: Particle Dual Averaging: Optimization of Mean Field Neural Network with Global Convergence Rate Analysis »
Atsushi Nitanda · Denny Wu · Taiji Suzuki -
2021 Poster: How does a Neural Network's Architecture Impact its Robustness to Noisy Labels? »
Jingling Li · Mozhi Zhang · Keyulu Xu · John Dickerson · Jimmy Ba -
2021 Poster: Manipulating SGD with Data Ordering Attacks »
I Shumailov · Zakhar Shumaylov · Dmitry Kazhdan · Yiren Zhao · Nicolas Papernot · Murat Erdogdu · Ross J Anderson -
2021 Poster: On Empirical Risk Minimization with Dependent and Heavy-Tailed Data »
Abhishek Roy · Krishnakumar Balasubramanian · Murat Erdogdu -
2021 Poster: Deep learning is adaptive to intrinsic dimensionality of model smoothness in anisotropic Besov space »
Taiji Suzuki · Atsushi Nitanda -
2021 Poster: Convergence Rates of Stochastic Gradient Descent under Infinite Noise Variance »
Hongjian Wang · Mert Gurbuzbalaban · Lingjiong Zhu · Umut Simsekli · Murat Erdogdu -
2021 Poster: Fractal Structure and Generalization Properties of Stochastic Optimization Algorithms »
Alexander Camuto · George Deligiannidis · Murat Erdogdu · Mert Gurbuzbalaban · Umut Simsekli · Lingjiong Zhu -
2020 : Contributed Talk #2: Evaluating Agents Without Rewards »
Brendon Matusch · Danijar Hafner · Jimmy Ba -
2020 : Contributed Talk: Planning from Pixels using Inverse Dynamics Models »
Keiran Paster · Sheila McIlraith · Jimmy Ba -
2020 Poster: On the Ergodicity, Bias and Asymptotic Normality of Randomized Midpoint Sampling Method »
Ye He · Krishnakumar Balasubramanian · Murat Erdogdu -
2020 Poster: Spectra of the Conjugate Kernel and Neural Tangent Kernel for linear-width neural networks »
Zhou Fan · Zhichao Wang -
2020 Oral: Spectra of the Conjugate Kernel and Neural Tangent Kernel for linear-width neural networks »
Zhou Fan · Zhichao Wang -
2020 Session: Orals & Spotlights Track 34: Deep Learning »
Tuo Zhao · Jimmy Ba -
2020 Poster: Hausdorff Dimension, Heavy Tails, and Generalization in Neural Networks »
Umut Simsekli · Ozan Sener · George Deligiannidis · Murat Erdogdu -
2020 Spotlight: Hausdorff Dimension, Heavy Tails, and Generalization in Neural Networks »
Umut Simsekli · Ozan Sener · George Deligiannidis · Murat Erdogdu -
2019 : Poster Session »
Eduard Gorbunov · Alexandre d'Aspremont · Lingxiao Wang · Liwei Wang · Boris Ginsburg · Alessio Quaglino · Camille Castera · Saurabh Adya · Diego Granziol · Rudrajit Das · Raghu Bollapragada · Fabian Pedregosa · Martin Takac · Majid Jahani · Sai Praneeth Karimireddy · Hilal Asi · Balint Daroczy · Leonard Adolphs · Aditya Rawal · Nicolas Brandt · Minhan Li · Giuseppe Ughi · Orlando Romero · Ivan Skorokhodov · Damien Scieur · Kiwook Bae · Konstantin Mishchenko · Rohan Anil · Vatsal Sharan · Aditya Balu · Chao Chen · Zhewei Yao · Tolga Ergen · Paul Grigas · Chris Junchi Li · Jimmy Ba · Stephen J Roberts · Sharan Vaswani · Armin Eftekhari · Chhavi Sharma -
2019 Poster: Lookahead Optimizer: k steps forward, 1 step back »
Michael Zhang · James Lucas · Jimmy Ba · Geoffrey E Hinton -
2019 Poster: Graph Normalizing Flows »
Jenny Liu · Aviral Kumar · Jimmy Ba · Jamie Kiros · Kevin Swersky -
2019 Poster: Stochastic Runge-Kutta Accelerates Langevin Monte Carlo and Beyond »
Xuechen (Chen) Li · Denny Wu · Lester Mackey · Murat Erdogdu -
2019 Spotlight: Stochastic Runge-Kutta Accelerates Langevin Monte Carlo and Beyond »
Xuechen (Chen) Li · Denny Wu · Lester Mackey · Murat Erdogdu -
2018 Poster: Global Non-convex Optimization with Discretized Diffusions »
Murat Erdogdu · Lester Mackey · Ohad Shamir -
2018 Poster: Reversible Recurrent Neural Networks »
Matthew MacKay · Paul Vicol · Jimmy Ba · Roger Grosse -
2017 Poster: Doubly Accelerated Stochastic Variance Reduced Dual Averaging Method for Regularized Empirical Risk Minimization »
Tomoya Murata · Taiji Suzuki -
2017 Poster: Robust Estimation of Neural Signals in Calcium Imaging »
Hakan Inan · Murat Erdogdu · Mark Schnitzer -
2017 Poster: Inference in Graphical Models via Semidefinite Programming Hierarchies »
Murat Erdogdu · Yash Deshpande · Andrea Montanari -
2017 Poster: Trimmed Density Ratio Estimation »
Song Liu · Akiko Takeda · Taiji Suzuki · Kenji Fukumizu -
2016 Poster: Scaled Least Squares Estimator for GLMs in Large-Scale Problems »
Murat Erdogdu · Lee H Dicker · Mohsen Bayati -
2015 Poster: Convergence rates of sub-sampled Newton methods »
Murat Erdogdu · Andrea Montanari -
2015 Poster: Newton-Stein Method: A Second Order Method for GLMs via Stein's Lemma »
Murat Erdogdu -
2015 Spotlight: Newton-Stein Method: A Second Order Method for GLMs via Stein's Lemma »
Murat Erdogdu -
2013 Poster: Estimating LASSO Risk and Noise Level »
Mohsen Bayati · Murat Erdogdu · Andrea Montanari