Timezone: »
Poster
Beyond Lazy Training for Over-parameterized Tensor Decomposition
Xiang Wang · Chenwei Wu · Jason Lee · Tengyu Ma · Rong Ge
Over-parametrization is an important technique in training neural networks. In both theory and practice, training a larger network allows the optimization algorithm to avoid bad local optimal solutions. In this paper we study a closely related tensor decomposition problem: given an $l$-th order tensor in $(R^d)^{\otimes l}$ of rank $r$ (where $r\ll d$), can variants of gradient descent find a rank $m$ decomposition where $m > r$? We show that in a lazy training regime (similar to the NTK regime for neural networks) one needs at least $m = \Omega(d^{l-1})$, while a variant of gradient descent can find an approximate tensor when $m = O^*(r^{2.5l}\log d)$. Our results show that gradient descent on over-parametrized objective could go beyond the lazy training regime and utilize certain low-rank structure in the data.
Author Information
Xiang Wang (Duke University)
Chenwei Wu (Duke University)
Jason Lee (Princeton University)
Tengyu Ma (Stanford University)
Rong Ge (Duke University)
More from the Same Authors
-
2021 : Sharp Bounds for FedAvg (Local SGD) »
Margalit Glasgow · Honglin Yuan · Tengyu Ma -
2022 : How Sharpness-Aware Minimization Minimizes Sharpness? »
Kaiyue Wen · Tengyu Ma · Zhiyuan Li -
2022 : How Sharpness-Aware Minimization Minimizes Sharpness? »
Kaiyue Wen · Tengyu Ma · Zhiyuan Li -
2022 : First Steps Toward Understanding the Extrapolation of Nonlinear Models to Unseen Domains »
Kefan Dong · Tengyu Ma -
2023 Poster: Data Selection for Language Models via Importance Resampling »
Sang Michael Xie · Shibani Santurkar · Tengyu Ma · Percy Liang -
2023 Poster: DoReMi: Optimizing Data Mixtures Speeds Up Language Model Pretraining »
Sang Michael Xie · Hieu Pham · Xuanyi Dong · Nan Du · Hanxiao Liu · Yifeng Lu · Percy Liang · Quoc V Le · Tengyu Ma · Adams Wei Yu -
2023 Poster: Connecting Pre-trained Language Model and Downstream Task via Properties of Representation »
Chenwei Wu · Holden Lee · Rong Ge -
2023 Poster: Beyond NTK with Vanilla Gradient Descent: A Mean-field Analysis of Neural Networks with Polynomial Width, Samples, and Time »
Arvind Mahankali · Jeff Z. HaoChen · Kefan Dong · Margalit Glasgow · Tengyu Ma -
2023 Poster: Sharpness Minimization Algorithms Do Not Only Minimize Sharpness To Achieve Better Generalization »
Kaiyue Wen · Tengyu Ma · Zhiyuan Li -
2023 Poster: Smoothing the Landscape Boosts the Signal for SGD: Optimal Sample Complexity for Learning Single Index Models »
Alex Damian · Eshaan Nichani · Rong Ge · Jason Lee -
2023 Poster: What is the Inductive Bias of Flatness Regularization? A Study of Deep Matrix Factorization Models »
Khashayar Gatmiry · Zhiyuan Li · Tengyu Ma · Sashank Reddi · Stefanie Jegelka · Ching-Yao Chuang -
2023 Poster: Robust Second-Order Nonconvex Optimization and Its Application to Low Rank Matrix Sensing »
Shuyao Li · Yu Cheng · Ilias Diakonikolas · Jelena Diakonikolas · Rong Ge · Stephen Wright -
2023 Oral: Sharpness Minimization Algorithms Do Not Only Minimize Sharpness To Achieve Better Generalization »
Kaiyue Wen · Tengyu Ma · Zhiyuan Li -
2023 Oral: Smoothing the Landscape Boosts the Signal for SGD: Optimal Sample Complexity for Learning Single Index Models »
Alex Damian · Eshaan Nichani · Rong Ge · Jason Lee -
2023 Workshop: Mathematics of Modern Machine Learning (M3L) »
Aditi Raghunathan · Alex Damian · Bingbin Liu · Christina Baek · Kaifeng Lyu · Surbhi Goel · Tengyu Ma · Zhiyuan Li -
2023 : Provable Feature Learning in Gradient Descent, Jason Lee »
Jason Lee -
2022 : First Steps Toward Understanding the Extrapolation of Nonlinear Models to Unseen Domains »
Kefan Dong · Tengyu Ma -
2022 Poster: Statistically Meaningful Approximation: a Case Study on Approximating Turing Machines with Transformers »
Colin Wei · Yining Chen · Tengyu Ma -
2022 Poster: Iterative Feature Matching: Toward Provable Domain Generalization with Logarithmic Environments »
Yining Chen · Elan Rosenfeld · Mark Sellke · Tengyu Ma · Andrej Risteski -
2022 Poster: Beyond Separability: Analyzing the Linear Transferability of Contrastive Representations to Related Subpopulations »
Jeff Z. HaoChen · Colin Wei · Ananya Kumar · Tengyu Ma -
2022 Poster: Outlier-Robust Sparse Estimation via Non-Convex Optimization »
Yu Cheng · Ilias Diakonikolas · Rong Ge · Shivam Gupta · Daniel Kane · Mahdi Soltanolkotabi -
2021 : Invited talk 7 »
Jason Lee -
2021 : Invited talk 4 »
Tengyu Ma -
2021 : Contributed Talk 4: Sharp Bounds for FedAvg (Local SGD) »
Margalit Glasgow · Honglin Yuan · Tengyu Ma -
2021 Poster: Understanding Deflation Process in Over-parametrized Tensor Decomposition »
Rong Ge · Yunwei Ren · Xiang Wang · Mo Zhou -
2021 Poster: A Regression Approach to Learning-Augmented Online Algorithms »
Keerti Anand · Rong Ge · Amit Kumar · Debmalya Panigrahi -
2020 Poster: Generalized Leverage Score Sampling for Neural Networks »
Jason Lee · Ruoqi Shen · Zhao Song · Mengdi Wang · zheng Yu -
2020 Poster: Federated Accelerated Stochastic Gradient Descent »
Honglin Yuan · Tengyu Ma -
2020 Poster: Convergence of Meta-Learning with Task-Specific Adaptation over Partial Parameters »
Kaiyi Ji · Jason Lee · Yingbin Liang · H. Vincent Poor -
2020 Poster: Self-training Avoids Using Spurious Features Under Domain Shift »
Yining Chen · Colin Wei · Ananya Kumar · Tengyu Ma -
2020 Poster: Implicit Bias in Deep Linear Classification: Initialization Scale vs Training Accuracy »
Edward Moroshko · Blake Woodworth · Suriya Gunasekar · Jason Lee · Nati Srebro · Daniel Soudry -
2020 Spotlight: Implicit Bias in Deep Linear Classification: Initialization Scale vs Training Accuracy »
Edward Moroshko · Blake Woodworth · Suriya Gunasekar · Jason Lee · Nati Srebro · Daniel Soudry -
2020 Poster: Model-based Adversarial Meta-Reinforcement Learning »
Zichuan Lin · Garrett Thomas · Guangwen Yang · Tengyu Ma -
2020 Poster: Sanity-Checking Pruning Methods: Random Tickets can Win the Jackpot »
Jingtong Su · Yihang Chen · Tianle Cai · Tianhao Wu · Ruiqi Gao · Liwei Wang · Jason Lee -
2020 Poster: Agnostic $Q$-learning with Function Approximation in Deterministic Systems: Near-Optimal Bounds on Approximation Error and Sample Complexity »
Simon Du · Jason Lee · Gaurav Mahajan · Ruosong Wang -
2020 Poster: Towards Understanding Hierarchical Learning: Benefits of Neural Representations »
Minshuo Chen · Yu Bai · Jason Lee · Tuo Zhao · Huan Wang · Caiming Xiong · Richard Socher -
2020 Poster: MOPO: Model-based Offline Policy Optimization »
Tianhe Yu · Garrett Thomas · Lantao Yu · Stefano Ermon · James Zou · Sergey Levine · Chelsea Finn · Tengyu Ma -
2020 Poster: How to Characterize The Landscape of Overparameterized Convolutional Neural Networks »
Yihong Gu · Weizhong Zhang · Cong Fang · Jason Lee · Tong Zhang -
2019 Poster: Learning Imbalanced Datasets with Label-Distribution-Aware Margin Loss »
Kaidi Cao · Colin Wei · Adrien Gaidon · Nikos Arechiga · Tengyu Ma -
2019 Poster: Regularization Matters: Generalization and Optimization of Neural Nets v.s. their Induced Kernel »
Colin Wei · Jason Lee · Qiang Liu · Tengyu Ma -
2019 Spotlight: Regularization Matters: Generalization and Optimization of Neural Nets v.s. their Induced Kernel »
Colin Wei · Jason Lee · Qiang Liu · Tengyu Ma -
2019 Poster: Explaining Landscape Connectivity of Low-cost Solutions for Multilayer Nets »
Rohith Kuditipudi · Xiang Wang · Holden Lee · Yi Zhang · Zhiyuan Li · Wei Hu · Rong Ge · Sanjeev Arora -
2019 Poster: Solving a Class of Non-Convex Min-Max Games Using Iterative First Order Methods »
Maher Nouiehed · Maziar Sanjabi · Tianjian Huang · Jason Lee · Meisam Razaviyayn -
2019 Poster: The Step Decay Schedule: A Near Optimal, Geometrically Decaying Learning Rate Procedure For Least Squares »
Rong Ge · Sham Kakade · Rahul Kidambi · Praneeth Netrapalli -
2019 Poster: Convergence of Adversarial Training in Overparametrized Neural Networks »
Ruiqi Gao · Tianle Cai · Haochuan Li · Cho-Jui Hsieh · Liwei Wang · Jason Lee -
2019 Spotlight: Convergence of Adversarial Training in Overparametrized Neural Networks »
Ruiqi Gao · Tianle Cai · Haochuan Li · Cho-Jui Hsieh · Liwei Wang · Jason Lee -
2019 Poster: Neural Temporal-Difference Learning Converges to Global Optima »
Qi Cai · Zhuoran Yang · Jason Lee · Zhaoran Wang -
2019 Poster: Verified Uncertainty Calibration »
Ananya Kumar · Percy Liang · Tengyu Ma -
2019 Spotlight: Verified Uncertainty Calibration »
Ananya Kumar · Percy Liang · Tengyu Ma -
2019 Poster: Data-dependent Sample Complexity of Deep Neural Networks via Lipschitz Augmentation »
Colin Wei · Tengyu Ma -
2019 Poster: Towards Explaining the Regularization Effect of Initial Large Learning Rate in Training Neural Networks »
Yuanzhi Li · Colin Wei · Tengyu Ma -
2019 Spotlight: Data-dependent Sample Complexity of Deep Neural Networks via Lipschitz Augmentation »
Colin Wei · Tengyu Ma -
2019 Spotlight: Towards Explaining the Regularization Effect of Initial Large Learning Rate in Training Neural Networks »
Yuanzhi Li · Colin Wei · Tengyu Ma -
2018 Poster: On the Local Minima of the Empirical Risk »
Chi Jin · Lydia T. Liu · Rong Ge · Michael Jordan -
2018 Spotlight: On the Local Minima of the Empirical Risk »
Chi Jin · Lydia T. Liu · Rong Ge · Michael Jordan -
2018 Poster: Beyond Log-concavity: Provable Guarantees for Sampling Multi-modal Distributions using Simulated Tempering Langevin Monte Carlo »
Holden Lee · Andrej Risteski · Rong Ge -
2017 Poster: On the Optimization Landscape of Tensor Decompositions »
Rong Ge · Tengyu Ma -
2017 Oral: On the Optimization Landscape of Tensor Decompositions »
Rong Ge · Tengyu Ma