Timezone: »
Poster
Momentum-Based Variance Reduction in Non-Convex SGD
Ashok Cutkosky · Francesco Orabona
Thu Dec 12 05:00 PM -- 07:00 PM (PST) @ East Exhibition Hall B + C #214
Variance reduction has emerged in recent years as a strong competitor to stochastic gradient descent in non-convex problems, providing the first algorithms to improve upon the converge rate of stochastic gradient descent for finding first-order critical points. However, variance reduction techniques typically require carefully tuned learning rates and willingness to use excessively large "mega-batches" in order to achieve their improved results. We present a new algorithm, STORM, that does not require any batches and makes use of adaptive learning rates, enabling simpler implementation and less hyperparameter tuning. Our technique for removing the batches uses a variant of momentum to achieve variance reduction in non-convex optimization. On smooth losses $F$, STORM finds a point $x$ with $\mathbb{E}[\|\nabla F(x)\|]\le O(1/\sqrt{T}+\sigma^{1/3}/T^{1/3})$ in $T$ iterations with $\sigma^2$ variance in the gradients, matching the best-known rate but without requiring knowledge of $\sigma$.
Author Information
Ashok Cutkosky (Google Research)
Francesco Orabona (Boston University)
More from the Same Authors
-
2021 Spotlight: Online Selective Classification with Limited Feedback »
Aditya Gangrade · Anil Kag · Ashok Cutkosky · Venkatesh Saligrama -
2022 Poster: Optimal Comparator Adaptive Online Learning with Switching Cost »
Zhiyu Zhang · Ashok Cutkosky · Yannis Paschalidis -
2022 Poster: Better SGD using Second-order Momentum »
Hoang Tran · Ashok Cutkosky -
2022 Poster: Momentum Aggregation for Private Non-convex ERM »
Hoang Tran · Ashok Cutkosky -
2022 Poster: Robustness to Unbounded Smoothness of Generalized SignSGD »
Michael Crawshaw · Mingrui Liu · Francesco Orabona · Wei Zhang · Zhenxun Zhuang -
2022 Poster: Parameter-free Regret in High Probability with Heavy Tails »
Jiujia Zhang · Ashok Cutkosky -
2022 Poster: Differentially Private Online-to-batch for Smooth Losses »
Qinzi Zhang · Hoang Tran · Ashok Cutkosky -
2021 Oral: High-probability Bounds for Non-Convex Stochastic Optimization with Heavy Tails »
Ashok Cutkosky · Harsh Mehta -
2021 Poster: High-probability Bounds for Non-Convex Stochastic Optimization with Heavy Tails »
Ashok Cutkosky · Harsh Mehta -
2021 Poster: Online Selective Classification with Limited Feedback »
Aditya Gangrade · Anil Kag · Ashok Cutkosky · Venkatesh Saligrama -
2021 Poster: Logarithmic Regret from Sublinear Hints »
Aditya Bhaskara · Ashok Cutkosky · Ravi Kumar · Manish Purohit -
2021 Poster: Minimax Optimal Quantile and Semi-Adversarial Regret via Root-Logarithmic Regularizers »
Jeffrey Negrea · Blair Bilodeau · Nicolò Campolongo · Francesco Orabona · Dan Roy -
2020 Poster: Better Full-Matrix Regret via Parameter-Free Online Learning »
Ashok Cutkosky -
2020 Poster: Online Linear Optimization with Many Hints »
Aditya Bhaskara · Ashok Cutkosky · Ravi Kumar · Manish Purohit -
2020 Poster: Temporal Variability in Implicit Online Learning »
Nicolò Campolongo · Francesco Orabona -
2020 Poster: Comparator-Adaptive Convex Bandits »
Dirk van der Hoeven · Ashok Cutkosky · Haipeng Luo -
2019 Poster: Kernel Truncated Randomized Ridge Regression: Optimal Rates and Low Noise Acceleration »
Kwang-Sung Jun · Ashok Cutkosky · Francesco Orabona -
2018 Poster: Distributed Stochastic Optimization via Adaptive SGD »
Ashok Cutkosky · Róbert Busa-Fekete -
2017 Poster: Stochastic and Adversarial Online Learning without Hyperparameters »
Ashok Cutkosky · Kwabena A Boahen -
2017 Poster: Training Deep Networks without Learning Rates Through Coin Betting »
Francesco Orabona · Tatiana Tommasi -
2016 Poster: Online Convex Optimization with Unconstrained Domains and Losses »
Ashok Cutkosky · Kwabena A Boahen -
2016 Poster: Coin Betting and Parameter-Free Online Learning »
Francesco Orabona · David Pal -
2014 Workshop: Second Workshop on Transfer and Multi-Task Learning: Theory meets Practice »
Urun Dogan · Tatiana Tommasi · Yoshua Bengio · Francesco Orabona · Marius Kloft · Andres Munoz · Gunnar Rätsch · Hal Daumé III · Mehryar Mohri · Xuezhi Wang · Daniel Hernández-lobato · Song Liu · Thomas Unterthiner · Pascal Germain · Vinay P Namboodiri · Michael Goetz · Christopher Berlind · Sigurd Spieckermann · Marta Soare · Yujia Li · Vitaly Kuznetsov · Wenzhao Lian · Daniele Calandriello · Emilie Morvant -
2014 Workshop: Modern Nonparametrics 3: Automating the Learning Pipeline »
Eric Xing · Mladen Kolar · Arthur Gretton · Samory Kpotufe · Han Liu · Zoltán Szabó · Alan Yuille · Andrew G Wilson · Ryan Tibshirani · Sasha Rakhlin · Damian Kozbur · Bharath Sriperumbudur · David Lopez-Paz · Kirthevasan Kandasamy · Francesco Orabona · Andreas Damianou · Wacha Bounliphone · Yanshuai Cao · Arijit Das · Yingzhen Yang · Giulia DeSalvo · Dmitry Storcheus · Roberto Valerio -
2014 Poster: Simultaneous Model Selection and Optimization through Parameter-free Stochastic Learning »
Francesco Orabona -
2013 Workshop: New Directions in Transfer and Multi-Task: Learning Across Domains and Tasks »
Urun Dogan · Marius Kloft · Tatiana Tommasi · Francesco Orabona · Massimiliano Pontil · Sinno Jialin Pan · Shai Ben-David · Arthur Gretton · Fei Sha · Marco Signoretto · Rajhans Samdani · Yun-Qian Miao · Mohammad Gheshlaghi azar · Ruth Urner · Christoph Lampert · Jonathan How -
2013 Poster: Dimension-Free Exponentiated Gradient »
Francesco Orabona -
2013 Spotlight: Dimension-Free Exponentiated Gradient »
Francesco Orabona -
2013 Poster: Regression-tree Tuning in a Streaming Setting »
Samory Kpotufe · Francesco Orabona -
2013 Spotlight: Regression-tree Tuning in a Streaming Setting »
Samory Kpotufe · Francesco Orabona -
2012 Poster: On Multilabel Classification and Ranking with Partial Feedback »
Claudio Gentile · Francesco Orabona -
2012 Spotlight: On Multilabel Classification and Ranking with Partial Feedback »
Claudio Gentile · Francesco Orabona -
2010 Poster: New Adaptive Algorithms for Online Classification »
Francesco Orabona · Yacov Crammer -
2010 Spotlight: Learning from Candidate Labeling Sets »
Jie Luo · Francesco Orabona -
2010 Poster: Learning from Candidate Labeling Sets »
Jie Luo · Francesco Orabona -
2009 Workshop: Learning from Multiple Sources with Applications to Robotics »
Barbara Caputo · Nicolò Cesa-Bianchi · David R Hardoon · Gayle Leen · Francesco Orabona · Jaakko Peltonen · Simon Rogers