Timezone: »
We analyze stochastic algorithms for optimizing nonconvex, nonsmooth finite-sum problems, where the nonsmooth part is convex. Surprisingly, unlike the smooth case, our knowledge of this fundamental problem is very limited. For example, it is not known whether the proximal stochastic gradient method with constant minibatch converges to a stationary point. To tackle this issue, we develop fast stochastic algorithms that provably converge to a stationary point for constant minibatches. Furthermore, using a variant of these algorithms, we obtain provably faster convergence than batch proximal gradient descent. Our results are based on the recent variance reduction techniques for convex optimization but with a novel analysis for handling nonconvex and nonsmooth functions. We also prove global linear convergence rate for an interesting subclass of nonsmooth nonconvex functions, which subsumes several recent works.
Author Information
Sashank J. Reddi (Carnegie Mellon University)
Suvrit Sra (MIT)
Suvrit Sra is a faculty member within the EECS department at MIT, where he is also a core faculty member of IDSS, LIDS, MIT-ML Group, as well as the statistics and data science center. His research spans topics in optimization, matrix theory, differential geometry, and probability theory, which he connects with machine learning --- a key focus of his research is on the theme "Optimization for Machine Learning” (http://opt-ml.org)
Barnabas Poczos (Carnegie Mellon University)
Alexander Smola (Amazon)
**AWS Machine Learning**
More from the Same Authors
-
2021 Spotlight: Mixture Proportion Estimation and PU Learning:A Modern Approach »
Saurabh Garg · Yifan Wu · Alexander Smola · Sivaraman Balakrishnan · Zachary Lipton -
2021 : Benchmarking Multimodal AutoML for Tabular Data with Text Fields »
Xingjian Shi · Jonas Mueller · Nick Erickson · Mu Li · Alexander Smola -
2022 : Improving Molecule Properties Through 2-Stage VAE »
Chenghui Zhou · Barnabas Poczos -
2022 : RLSBench: A Large-Scale Empirical Study of Domain Adaptation Under Relaxed Label Shift »
Saurabh Garg · Nick Erickson · James Sharpnack · Alexander Smola · Sivaraman Balakrishnan · Zachary Lipton -
2023 Poster: The Curious Role of Normalization in Sharpness-Aware Minimization »
Yan Dai · Kwangjun Ahn · Suvrit Sra -
2023 Poster: Prompt Pre-Training with Twenty-Thousand Classes for Open-Vocabulary Visual Recognition »
Shuhuai Ren · Aston Zhang · Yi Zhu · Shuai Zhang · Shuai Zheng · Mu Li · Alexander Smola · Xu Sun -
2023 Poster: Transformers learn to implement preconditioned gradient descent for in-context learning »
Kwangjun Ahn · Xiang Cheng · Hadi Daneshmand · Suvrit Sra -
2022 Poster: Adaptive Interest for Emphatic Reinforcement Learning »
Martin Klissarov · Rasool Fakoor · Jonas Mueller · Kavosh Asadi · Taesup Kim · Alexander Smola -
2022 Poster: CCCP is Frank-Wolfe in disguise »
Alp Yurtsever · Suvrit Sra -
2022 Poster: Faster Deep Reinforcement Learning with Slower Online Network »
Kavosh Asadi · Rasool Fakoor · Omer Gottesman · Taesup Kim · Michael Littman · Alexander Smola -
2022 Poster: Efficient Sampling on Riemannian Manifolds via Langevin MCMC »
Xiang Cheng · Jingzhao Zhang · Suvrit Sra -
2022 Poster: Graph Reordering for Cache-Efficient Near Neighbor Search »
Benjamin Coleman · Santiago Segarra · Alexander Smola · Anshumali Shrivastava -
2021 Poster: Can contrastive learning avoid shortcut solutions? »
Joshua Robinson · Li Sun · Ke Yu · Kayhan Batmanghelich · Stefanie Jegelka · Suvrit Sra -
2021 Poster: Mixture Proportion Estimation and PU Learning:A Modern Approach »
Saurabh Garg · Yifan Wu · Alexander Smola · Sivaraman Balakrishnan · Zachary Lipton -
2021 Poster: Deep Explicit Duration Switching Models for Time Series »
Abdul Fatir Ansari · Konstantinos Benidis · Richard Kurle · Ali Caner Turkmen · Harold Soh · Alexander Smola · Bernie Wang · Tim Januschowski -
2021 Poster: Three Operator Splitting with Subgradients, Stochastic Gradients, and Adaptive Learning Rates »
Alp Yurtsever · Alex Gu · Suvrit Sra -
2021 Poster: Continuous Doubly Constrained Batch Reinforcement Learning »
Rasool Fakoor · Jonas Mueller · Kavosh Asadi · Pratik Chaudhari · Alexander Smola -
2020 : Invited speaker: SGD without replacement: optimal rate analysis and more, Suvrit Sra »
Suvrit Sra -
2020 Poster: SGD with shuffling: optimal rates without component convexity and large epoch requirements »
Kwangjun Ahn · Chulhee Yun · Suvrit Sra -
2020 Spotlight: SGD with shuffling: optimal rates without component convexity and large epoch requirements »
Kwangjun Ahn · Chulhee Yun · Suvrit Sra -
2020 Poster: Modeling Task Effects on Meaning Representation in the Brain via Zero-Shot MEG Prediction »
Mariya Toneva · Otilia Stretcu · Barnabas Poczos · Leila Wehbe · Tom Mitchell -
2020 Poster: Fast, Accurate, and Simple Models for Tabular Data via Augmented Distillation »
Rasool Fakoor · Jonas Mueller · Nick Erickson · Pratik Chaudhari · Alexander Smola -
2020 Poster: Why are Adaptive Methods Good for Attention Models? »
Jingzhao Zhang · Sai Praneeth Karimireddy · Andreas Veit · Seungyeon Kim · Sashank Reddi · Sanjiv Kumar · Suvrit Sra -
2020 Poster: Towards Minimax Optimal Reinforcement Learning in Factored Markov Decision Processes »
Yi Tian · Jian Qian · Suvrit Sra -
2020 Poster: Robust Density Estimation under Besov IPM Losses »
Ananya Uppal · Shashank Singh · Barnabas Poczos -
2020 Spotlight: Towards Minimax Optimal Reinforcement Learning in Factored Markov Decision Processes »
Yi Tian · Jian Qian · Suvrit Sra -
2020 Spotlight: Robust Density Estimation under Besov IPM Losses »
Ananya Uppal · Shashank Singh · Barnabas Poczos -
2019 : Invited Talk - Alexander J. Smola - Sets and symmetries »
Alexander Smola -
2019 : Opening Remarks »
Manzil Zaheer · Nicholas Monath · Ari Kobren · Junier Oliva · Barnabas Poczos · Ruslan Salakhutdinov · Andrew McCallum -
2019 Workshop: Sets and Partitions »
Nicholas Monath · Manzil Zaheer · Andrew McCallum · Ari Kobren · Junier Oliva · Barnabas Poczos · Ruslan Salakhutdinov -
2019 : Poster Session »
Rishav Chourasia · Yichong Xu · Corinna Cortes · Chien-Yi Chang · Yoshihiro Nagano · So Yeon Min · Benedikt Boecking · Phi Vu Tran · Kamyar Ghasemipour · Qianggang Ding · Shouvik Mani · Vikram Voleti · Rasool Fakoor · Miao Xu · Kenneth Marino · Lisa Lee · Volker Tresp · Jean-Francois Kagy · Marvin Zhang · Barnabas Poczos · Dinesh Khandelwal · Adrien Bardes · Evan Shelhamer · Jiacheng Zhu · Ziming Li · Xiaoyan Li · Dmitrii Krasheninnikov · Ruohan Wang · Mayoore Jaiswal · Emad Barsoum · Suvansh Sanjeev · Theeraphol Wattanavekin · Qizhe Xie · Sifan Wu · Yuki Yoshida · David Kanaa · Sina Khoshfetrat Pakazad · Mehdi Maasoumy -
2019 Poster: Nonparametric Density Estimation & Convergence Rates for GANs under Besov IPM Losses »
Ananya Uppal · Shashank Singh · Barnabas Poczos -
2019 Poster: Flexible Modeling of Diversity with Strongly Log-Concave Distributions »
Joshua Robinson · Suvrit Sra · Stefanie Jegelka -
2019 Poster: Are deep ResNets provably better than linear predictors? »
Chulhee Yun · Suvrit Sra · Ali Jadbabaie -
2019 Oral: Nonparametric Density Estimation & Convergence Rates for GANs under Besov IPM Losses »
Ananya Uppal · Shashank Singh · Barnabas Poczos -
2019 Poster: Graph Neural Tangent Kernel: Fusing Graph Neural Networks with Graph Kernels »
Simon Du · Kangcheng Hou · Russ Salakhutdinov · Barnabas Poczos · Ruosong Wang · Keyulu Xu -
2019 Poster: Small ReLU networks are powerful memorizers: a tight analysis of memorization capacity »
Chulhee Yun · Suvrit Sra · Ali Jadbabaie -
2019 Spotlight: Small ReLU networks are powerful memorizers: a tight analysis of memorization capacity »
Chulhee Yun · Suvrit Sra · Ali Jadbabaie -
2019 Poster: Learning Local Search Heuristics for Boolean Satisfiability »
Emre Yolcu · Barnabas Poczos -
2018 Poster: Direct Runge-Kutta Discretization Achieves Acceleration »
Jingzhao Zhang · Aryan Mokhtari · Suvrit Sra · Ali Jadbabaie -
2018 Poster: Nonparametric Density Estimation under Adversarial Losses »
Shashank Singh · Ananya Uppal · Boyue Li · Chun-Liang Li · Manzil Zaheer · Barnabas Poczos -
2018 Spotlight: Direct Runge-Kutta Discretization Achieves Acceleration »
Jingzhao Zhang · Aryan Mokhtari · Suvrit Sra · Ali Jadbabaie -
2018 Poster: Exponentiated Strongly Rayleigh Distributions »
Zelda Mariet · Suvrit Sra · Stefanie Jegelka -
2018 Poster: Neural Architecture Search with Bayesian Optimisation and Optimal Transport »
Kirthevasan Kandasamy · Willie Neiswanger · Jeff Schneider · Barnabas Poczos · Eric Xing -
2018 Spotlight: Neural Architecture Search with Bayesian Optimisation and Optimal Transport »
Kirthevasan Kandasamy · Willie Neiswanger · Jeff Schneider · Barnabas Poczos · Eric Xing -
2018 Tutorial: Negative Dependence, Stable Polynomials, and All That »
Suvrit Sra · Stefanie Jegelka -
2017 : Distribution Regression and its Applications. »
Barnabas Poczos -
2017 : TBA11 »
Alexander Smola -
2017 Workshop: OPT 2017: Optimization for Machine Learning »
Suvrit Sra · Sashank J. Reddi · Alekh Agarwal · Benjamin Recht -
2017 Oral: Deep Sets »
Manzil Zaheer · Satwik Kottur · Siamak Ravanbakhsh · Barnabas Poczos · Ruslan Salakhutdinov · Alexander Smola -
2017 Poster: Hypothesis Transfer Learning via Transformation Functions »
Simon Du · Jayanth Koushik · Aarti Singh · Barnabas Poczos -
2017 Poster: MMD GAN: Towards Deeper Understanding of Moment Matching Network »
Chun-Liang Li · Wei-Cheng Chang · Yu Cheng · Yiming Yang · Barnabas Poczos -
2017 Poster: Deep Sets »
Manzil Zaheer · Satwik Kottur · Siamak Ravanbakhsh · Barnabas Poczos · Ruslan Salakhutdinov · Alexander Smola -
2017 Poster: Gradient Descent Can Take Exponential Time to Escape Saddle Points »
Simon Du · Chi Jin · Jason D Lee · Michael Jordan · Aarti Singh · Barnabas Poczos -
2017 Poster: Elementary Symmetric Polynomials for Optimal Experimental Design »
Zelda Mariet · Suvrit Sra -
2017 Spotlight: Gradient Descent Can Take Exponential Time to Escape Saddle Points »
Simon Du · Chi Jin · Jason D Lee · Michael Jordan · Aarti Singh · Barnabas Poczos -
2017 Poster: Polynomial time algorithms for dual volume sampling »
Chengtao Li · Stefanie Jegelka · Suvrit Sra -
2016 Workshop: OPT 2016: Optimization for Machine Learning »
Suvrit Sra · Francis Bach · Sashank J. Reddi · Niao He -
2016 : Taming non-convexity via geometry »
Suvrit Sra -
2016 Poster: Fast Mixing Markov Chains for Strongly Rayleigh Measures, DPPs, and Constrained Sampling »
Chengtao Li · Suvrit Sra · Stefanie Jegelka -
2016 Poster: Kronecker Determinantal Point Processes »
Zelda Mariet · Suvrit Sra -
2016 Poster: Variance Reduction in Stochastic Gradient Langevin Dynamics »
Kumar Avinava Dubey · Sashank J. Reddi · Sinead Williamson · Barnabas Poczos · Alexander Smola · Eric Xing -
2016 Poster: The Multi-fidelity Multi-armed Bandit »
Kirthevasan Kandasamy · Gautam Dasarathy · Barnabas Poczos · Jeff Schneider -
2016 Poster: Finite-Sample Analysis of Fixed-k Nearest Neighbor Density Functional Estimators »
Shashank Singh · Barnabas Poczos -
2016 Poster: Gaussian Process Bandit Optimisation with Multi-fidelity Evaluations »
Kirthevasan Kandasamy · Gautam Dasarathy · Junier B Oliva · Jeff Schneider · Barnabas Poczos -
2016 Poster: Riemannian SVRG: Fast Stochastic Optimization on Riemannian Manifolds »
Hongyi Zhang · Sashank J. Reddi · Suvrit Sra -
2016 Poster: Efficient Nonparametric Smoothness Estimation »
Shashank Singh · Simon Du · Barnabas Poczos -
2016 Tutorial: Large-Scale Optimization: Beyond Stochastic Gradient Descent and Convexity »
Suvrit Sra · Francis Bach -
2015 : Scaling Machine Learning »
Alexander Smola -
2015 Workshop: Nonparametric Methods for Large Scale Representation Learning »
Andrew G Wilson · Alexander Smola · Eric Xing -
2015 Workshop: Optimization for Machine Learning (OPT2015) »
Suvrit Sra · Alekh Agarwal · Leon Bottou · Sashank J. Reddi -
2015 Poster: Fast and Guaranteed Tensor Decomposition via Sketching »
Yining Wang · Hsiao-Yu Tung · Alexander Smola · Anima Anandkumar -
2015 Spotlight: Fast and Guaranteed Tensor Decomposition via Sketching »
Yining Wang · Hsiao-Yu Tung · Alexander Smola · Anima Anandkumar -
2015 Poster: Nonparametric von Mises Estimators for Entropies, Divergences and Mutual Informations »
Kirthevasan Kandasamy · Akshay Krishnamurthy · Barnabas Poczos · Larry Wasserman · james m robins -
2015 Poster: Matrix Manifold Optimization for Gaussian Mixtures »
Reshad Hosseini · Suvrit Sra -
2015 Poster: On Variance Reduction in Stochastic Gradient Descent and its Asynchronous Variants »
Sashank J. Reddi · Ahmed Hefny · Suvrit Sra · Barnabas Poczos · Alexander Smola -
2014 Workshop: OPT2014: Optimization for Machine Learning »
Zaid Harchaoui · Suvrit Sra · Alekh Agarwal · Martin Jaggi · Miro Dudik · Aaditya Ramdas · Jean Lasserre · Yoshua Bengio · Amir Beck -
2014 Poster: Communication Efficient Distributed Machine Learning with the Parameter Server »
Mu Li · David G Andersen · Alexander Smola · Kai Yu -
2014 Poster: Efficient Structured Matrix Rank Minimization »
Adams Wei Yu · Wanli Ma · Yaoliang Yu · Jaime Carbonell · Suvrit Sra -
2014 Poster: Exponential Concentration of a Density Functional Estimator »
Shashank Singh · Barnabas Poczos -
2014 Poster: Spectral Methods for Indian Buffet Process Inference »
Hsiao-Yu Tung · Alexander Smola -
2013 Workshop: Topic Models: Computation, Application, and Evaluation »
David Mimno · Amr Ahmed · Jordan Boyd-Graber · Ankur Moitra · Hanna Wallach · Alexander Smola · David Blei · Anima Anandkumar -
2013 Workshop: Randomized Methods for Machine Learning »
David Lopez-Paz · Quoc V Le · Alexander Smola -
2013 Workshop: Modern Nonparametric Methods in Machine Learning »
Arthur Gretton · Mladen Kolar · Samory Kpotufe · John Lafferty · Han Liu · Bernhard Schölkopf · Alexander Smola · Rob Nowak · Mikhail Belkin · Lorenzo Rosasco · peter bickel · Yue Zhao -
2013 Workshop: OPT2013: Optimization for Machine Learning »
Suvrit Sra · Alekh Agarwal -
2013 Poster: Geometric optimisation on positive definite matrices for elliptically contoured distributions »
Suvrit Sra · Reshad Hosseini -
2013 Poster: Variance Reduction for Stochastic Gradient Optimization »
Chong Wang · Xi Chen · Alexander Smola · Eric Xing -
2013 Poster: Reflection methods for user-friendly submodular optimization »
Stefanie Jegelka · Francis Bach · Suvrit Sra -
2012 Workshop: Confluence between Kernel Methods and Graphical Models »
Le Song · Arthur Gretton · Alexander Smola -
2012 Workshop: Optimization for Machine Learning »
Suvrit Sra · Alekh Agarwal -
2012 Session: Oral Session 10 »
Alexander Smola -
2012 Poster: Learning Networks of Heterogeneous Influence »
Nan Du · Le Song · Alexander Smola · Ming Yuan -
2012 Poster: FastEx: Fast Clustering with Exponential Families »
Amr Ahmed · Sujith Ravi · Shravan M Narayanamurthy · Alexander Smola -
2012 Poster: A new metric on the manifold of kernel matrices with application to matrix geometric means »
Suvrit Sra -
2012 Spotlight: Learning Networks of Heterogeneous Influence »
Nan Du · Le Song · Alexander Smola · Ming Yuan -
2012 Poster: Scalable nonconvex inexact proximal splitting »
Suvrit Sra -
2011 Workshop: Big Learning: Algorithms, Systems, and Tools for Learning at Scale »
Joseph E Gonzalez · Sameer Singh · Graham Taylor · James Bergstra · Alice Zheng · Misha Bilenko · Yucheng Low · Yoshua Bengio · Michael Franklin · Carlos Guestrin · Andrew McCallum · Alexander Smola · Michael Jordan · Sugato Basu -
2011 Workshop: Optimization for Machine Learning »
Suvrit Sra · Stephen Wright · Sebastian Nowozin -
2011 Poster: Group Anomaly Detection using Flexible Genre Models »
Liang Xiong · Barnabas Poczos · Jeff Schneider -
2011 Tutorial: Graphical Models for the Internet »
Amr Ahmed · Alexander Smola -
2010 Workshop: Challenges of Data Visualization »
Barbara Hammer · Laurens van der Maaten · Fei Sha · Alexander Smola -
2010 Workshop: Numerical Mathematics Challenges in Machine Learning »
Matthias Seeger · Suvrit Sra -
2010 Workshop: Optimization for Machine Learning »
Suvrit Sra · Sebastian Nowozin · Stephen Wright -
2010 Poster: Word Features for Latent Dirichlet Allocation »
James Petterson · Alexander Smola · Tiberio Caetano · Wray L Buntine · Shravan M Narayanamurthy -
2010 Poster: Estimation of Renyi Entropy and Mutual Information Based on Generalized Nearest-Neighbor Graphs »
David Pal · Barnabas Poczos · Csaba Szepesvari -
2010 Poster: Optimal Web-Scale Tiering as a Flow Problem »
Gilbert Leung · Novi Quadrianto · Alexander Smola · Kostas Tsioutsiouliklis -
2010 Poster: Multitask Learning without Label Correspondences »
Novi Quadrianto · Alexander Smola · Tiberio Caetano · S.V.N. Vishwanathan · James Petterson -
2010 Poster: Parallelized Stochastic Gradient Descent »
Martin A Zinkevich · Markus Weimer · Alexander Smola · Lihong Li -
2009 Workshop: Optimization for Machine Learning »
Sebastian Nowozin · Suvrit Sra · S.V.N Vishwanthan · Stephen Wright -
2009 Workshop: Large-Scale Machine Learning: Parallelism and Massive Datasets »
Alexander Gray · Arthur Gretton · Alexander Smola · Joseph E Gonzalez · Carlos Guestrin -
2009 Poster: Slow Learners are Fast »
Martin A Zinkevich · Alexander Smola · John Langford -
2009 Poster: Distribution Matching for Transduction »
Novi Quadrianto · James Petterson · Alexander Smola -
2008 Workshop: Optimization for Machine Learning »
Suvrit Sra · Sebastian Nowozin · Vishwanathan S V N -
2008 Poster: Kernelized Sorting »
Novi Quadrianto · Le Song · Alexander Smola -
2008 Poster: Kernel Measures of Independence for non-iid Data »
Xinhua Zhang · Le Song · Arthur Gretton · Alexander Smola -
2008 Spotlight: Kernelized Sorting »
Novi Quadrianto · Le Song · Alexander Smola -
2008 Spotlight: Kernel Measures of Independence for non-iid Data »
Xinhua Zhang · Le Song · Arthur Gretton · Alexander Smola -
2008 Poster: Tighter Bounds for Structured Estimation »
Olivier Chapelle · Chuong B Do · Quoc V Le · Alexander Smola · Choon Hui Teo -
2008 Poster: Robust Near-Isometric Matching via Structured Learning of Graphical Models »
Julian J McAuley · Tiberio Caetano · Alexander Smola -
2007 Workshop: Representations and Inference on Probability Distributions »
Kenji Fukumizu · Arthur Gretton · Alexander Smola -
2007 Poster: Convex Learning with Invariances »
Choon Hui Teo · Amir Globerson · Sam T Roweis · Alexander Smola -
2007 Spotlight: A Kernel Statistical Test of Independence »
Arthur Gretton · Kenji Fukumizu · Choon Hui Teo · Le Song · Bernhard Schölkopf · Alexander Smola -
2007 Spotlight: Bundle Methods for Machine Learning »
Alexander Smola · Vishwanathan S V N · Quoc V Le -
2007 Poster: COFI RANK - Maximum Margin Matrix Factorization for Collaborative Ranking »
Markus Weimer · Alexandros Karatzoglou · Quoc V Le · Alexander Smola -
2007 Oral: Colored Maximum Variance Unfolding »
Le Song · Alexander Smola · Karsten Borgwardt · Arthur Gretton -
2007 Poster: Colored Maximum Variance Unfolding »
Le Song · Alexander Smola · Karsten Borgwardt · Arthur Gretton -
2007 Poster: A Kernel Statistical Test of Independence »
Arthur Gretton · Kenji Fukumizu · Choon Hui Teo · Le Song · Bernhard Schölkopf · Alexander Smola -
2007 Poster: Bundle Methods for Machine Learning »
Alexander Smola · Vishwanathan S V N · Quoc V Le -
2007 Spotlight: COFI RANK - Maximum Margin Matrix Factorization for Collaborative Ranking »
Markus Weimer · Alexandros Karatzoglou · Quoc V Le · Alexander Smola -
2007 Demonstration: Elefant »
Kishor Gawande · Alexander Smola · Vishwanathan S V N · Li Cheng · Simon A Guenter -
2007 Spotlight: Convex Learning with Invariances »
Choon Hui Teo · Amir Globerson · Sam T Roweis · Alexander Smola -
2006 Poster: A Kernel Method for the Two-Sample-Problem »
Arthur Gretton · Karsten Borgwardt · Malte J Rasch · Bernhard Schölkopf · Alexander Smola -
2006 Poster: Correcting Sample Selection Bias by Unlabeled Data »
Jiayuan Huang · Alexander Smola · Arthur Gretton · Karsten Borgwardt · Bernhard Schölkopf -
2006 Spotlight: Correcting Sample Selection Bias by Unlabeled Data »
Jiayuan Huang · Alexander Smola · Arthur Gretton · Karsten Borgwardt · Bernhard Schölkopf -
2006 Talk: A Kernel Method for the Two-Sample-Problem »
Arthur Gretton · Karsten Borgwardt · Malte J Rasch · Bernhard Schölkopf · Alexander Smola