Timezone: »
Poster
NEON2: Finding Local Minima via First-Order Oracles
Zeyuan Allen-Zhu · Yuanzhi Li
We propose a reduction for non-convex optimization that can (1) turn an stationary-point finding algorithm into an local-minimum finding one, and (2) replace the Hessian-vector product computations with only gradient computations. It works both in the stochastic and the deterministic settings, without hurting the algorithm's performance.
As applications, our reduction turns Natasha2 into a first-order method without hurting its theoretical performance. It also converts SGD, GD, SCSG, and SVRG into algorithms finding approximate local minima, outperforming some best known results.
Author Information
Zeyuan Allen-Zhu (Microsoft Research)
Yuanzhi Li (Princeton)
More from the Same Authors
-
2019 Poster: On the Convergence Rate of Training Recurrent Neural Networks »
Zeyuan Allen-Zhu · Yuanzhi Li · Zhao Song -
2019 Poster: What Can ResNet Learn Efficiently, Going Beyond Kernels? »
Zeyuan Allen-Zhu · Yuanzhi Li -
2019 Poster: Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two Layers »
Zeyuan Allen-Zhu · Yuanzhi Li · Yingyu Liang -
2019 Poster: Complexity of Highly Parallel Non-Smooth Convex Optimization »
Sebastien Bubeck · Qijia Jiang · Yin-Tat Lee · Yuanzhi Li · Aaron Sidford -
2019 Spotlight: Complexity of Highly Parallel Non-Smooth Convex Optimization »
Sebastien Bubeck · Qijia Jiang · Yin-Tat Lee · Yuanzhi Li · Aaron Sidford -
2019 Poster: Can SGD Learn Recurrent Neural Networks with Provable Generalization? »
Zeyuan Allen-Zhu · Yuanzhi Li -
2019 Poster: Towards Explaining the Regularization Effect of Initial Large Learning Rate in Training Neural Networks »
Yuanzhi Li · Colin Wei · Tengyu Ma -
2019 Spotlight: Towards Explaining the Regularization Effect of Initial Large Learning Rate in Training Neural Networks »
Yuanzhi Li · Colin Wei · Tengyu Ma -
2018 Poster: How To Make the Gradients Small Stochastically: Even Faster Convex and Nonconvex SGD »
Zeyuan Allen-Zhu -
2018 Poster: Online Improper Learning with an Approximation Oracle »
Elad Hazan · Wei Hu · Yuanzhi Li · Zhiyuan Li -
2018 Poster: Byzantine Stochastic Gradient Descent »
Dan Alistarh · Zeyuan Allen-Zhu · Jerry Li -
2018 Poster: Natasha 2: Faster Non-Convex Optimization Than SGD »
Zeyuan Allen-Zhu -
2018 Spotlight: Natasha 2: Faster Non-Convex Optimization Than SGD »
Zeyuan Allen-Zhu -
2018 Poster: Is Q-Learning Provably Efficient? »
Chi Jin · Zeyuan Allen-Zhu · Sebastien Bubeck · Michael Jordan -
2018 Poster: The Lingering of Gradients: How to Reuse Gradients Over Time »
Zeyuan Allen-Zhu · David Simchi-Levi · Xinshang Wang -
2018 Poster: Learning Overparameterized Neural Networks via Stochastic Gradient Descent on Structured Data »
Yuanzhi Li · Yingyu Liang -
2018 Spotlight: Learning Overparameterized Neural Networks via Stochastic Gradient Descent on Structured Data »
Yuanzhi Li · Yingyu Liang -
2017 Poster: Linear Convergence of a Frank-Wolfe Type Algorithm over Trace-Norm Balls »
Zeyuan Allen-Zhu · Elad Hazan · Wei Hu · Yuanzhi Li -
2017 Spotlight: Linear Convergence of a Frank-Wolfe Type Algorithm over Trace-Norm Balls »
Zeyuan Allen-Zhu · Elad Hazan · Wei Hu · Yuanzhi Li -
2016 Poster: Exploiting the Structure: Stochastic Gradient Methods Using Raw Clusters »
Zeyuan Allen-Zhu · Yang Yuan · Karthik Sridharan -
2016 Poster: Optimal Black-Box Reductions Between Optimization Objectives »
Zeyuan Allen-Zhu · Elad Hazan -
2016 Poster: Even Faster SVD Decomposition Yet Without Agonizing Pain »
Zeyuan Allen-Zhu · Yuanzhi Li