Large-Scale Optimization: Beyond Stochastic Gradient Descent and Convexity
Suvrit Sra · Francis Bach

Mon Dec 5th 02:30 -- 04:30 PM @ Rooms 211 + 212

Stochastic optimization lies at the heart of machine learning, and its cornerstone is stochastic gradient descent (SGD), a staple introduced over 60 years ago! Recent years have, however, brought an exciting new development: variance reduction (VR) for stochastic methods. These VR methods excel in settings where more than one pass through the training data is allowed, achieving convergence faster than SGD, in theory as well as practice. These speedups underline the huge surge of interest in VR methods; by now a large body of work has emerged, while new results appear regularly! This tutorial brings to the wider machine learning audience the key principles behind VR methods, by positioning them vis-à-vis SGD. Moreover, the tutorial takes a step beyond convexity and covers research-edge results for non-convex problems too, while outlining key points and as yet open challenges.

Learning Objectives:

– Introduce fast stochastic methods to the wider ML audience to go beyond a 60-year-old algorithm (SGD) – Provide a guiding light through this fast moving area, to unify, and simplify its presentation, outline common pitfalls, and to demystify its capabilities – Raise awareness about open challenges in the area, and thereby spur future research

Target Audience;

– Graduate students (masters as well as PhD stream)

– ML researchers in academia and industry who are not experts in stochastic optimization

– Practitioners who want to widen their repertoire of tools

Author Information

Suvrit Sra (MIT)

Suvrit Sra is a Research Faculty at the Laboratory for Information and Decision Systems (LIDS) at Massachusetts Institute of Technology (MIT). He obtained his PhD in Computer Science from the University of Texas at Austin in 2007. Before moving to MIT, he was a Senior Research Scientist at the Max Planck Institute for Intelligent Systems, in Tübingen, Germany. He has also held visiting faculty positions at UC Berkeley (EECS) and Carnegie Mellon University (Machine Learning Department) during 2013-2014. His research is dedicated to bridging a number of mathematical areas such as metric geometry, matrix analysis, convex analysis, probability theory, and optimization with machine learning; more broadly, his work involves algorithmically grounded topics within engineering and science. He has been a co-chair for OPT2008-2015, NIPS workshops on "Optimization for Machine Learning," and has also edited a volume of the same name (MIT Press, 2011).

Francis Bach (Inria)

Francis Bach is a researcher at INRIA, leading since 2011 the SIERRA project-team, which is part of the Computer Science Department at Ecole Normale Supérieure in Paris, France. After completing his Ph.D. in Computer Science at U.C. Berkeley, he spent two years at Ecole des Mines, and joined INRIA and Ecole Normale Supérieure in 2007. He is interested in statistical machine learning, and especially in convex optimization, combinatorial optimization, sparse methods, kernel-based learning, vision and signal processing. He gave numerous courses on optimization in the last few years in summer schools. He has been program co-chair for the International Conference on Machine Learning in 2015.

More from the Same Authors