Optimization for Machine Learning
Suvrit Sra · Sebastian Nowozin · Stephen Wright

Fri Dec 10th 07:30 AM -- 06:30 PM @ Westin: Emerald A
Event URL: »

Our workshop focuses on optimization theory and practice that is relevant to machine learning. This proposal builds on precedent established by two of our previously well-received NIPS workshops:


Both these workshops had packed (often overpacked) attendance almost throughout the day. This enthusiastic reception reflects the strong interest, relevance, and importance enjoyed by optimization in the greater ML community.

One could ask why does optimization attract such continued interest? The answer is simple but telling: optimization lies at the heart of almost every ML algorithm. For some algorithms textbook methods suffice, but the majority require tailoring algorithmic tools from optimization, which in turn depends on a deeper understanding of the ML requirements. In fact, ML applications and researchers are driving some of the most cutting-edge developments in optimization today. The intimate relation of optimization with ML is the key motivation for our workshop, which aims to foster discussion, discovery, and dissemination of the state-of-the-art in optimization, especially in the context of ML.

The workshop should realize its aims by:

* Providing a platform for increasing the interaction between researchers from optimization, operations research, statistics, scientific computing, and machine learning;
* Identifying key problems and challenges that lie at the intersection of optimization and ML;
* Narrowing the gap between optimization and ML, to help reduce rediscovery, and thereby accelerating new advances.


Previous talks at the OPT workshops have covered frameworks for convex programs (D. Bertsekas), the intersection of ML and optimization, especially in the area of SVM training (S. Wright), large-scale learning via stochastic gradient methods and its tradeoffs (L. Bottou, N. Srebro), exploitation of structured sparsity in optimization (Vandenberghe), and randomized methods for extremely large-scale convex optimization (A. Nemirovski). Several important realizations were brought to the fore by these talks, and many of the dominant ideas will appear in our book (to be published by MIT Press) on Optimization for Machine learning.

Given the above background it is easy to acknowledge that optimization is indispensable to machine learning. But what more can we say beyond this obvious realization?

The ML community's interest in optimization continues to grow. Invited tutorials on optimization will be presented this year at ICML (N. Srebro) and NIPS (S. Wright). The traditional `point of contact'' between ML and optimization - SVM - continues to be a driver of research on a number of fronts. Much interest has focused recently on stochastic gradient methods, which can be used in an online setting and in settings where data sets are extremely large and high accuracy is not required. Regularized logistic regression is another area that has produced a recent flurry of activity at the intersection of the two communities. Many aspects of stochastic gradient remain to be explored, for example, different algorithmic variants, customizing to the data set structure, convergence analysis, sampling techniques, software, choice of regularization and tradeoff parameters, parallelism. There also needs to be a better understanding of the limitations of these methods, and what can be done to accelerate them or to detect when to switch to alternative strategies. In the logistic regression setting, use of approximate second-order information has been shown to improve convergence, but many algorithmic issues remain. Detection of combined effects predictors (which lead to a huge increase in the number of variables), use of group regularizers, and dealing with the need to handle very large data sets in real time all present challenges. <br> <br>To avoid becoming lopsided, in our workshop we will also admit thenot particularly large scale' setting, where one has time to wield substantial computational resources. In this setting, high-accuracy solutions and deep understanding of the lessons contained in the data are needed. Examples valuable to MLers may be exploration of genetic and environmental data to identify risk factors for disease; or problems dealing with setups where the amount of observed data is not huge, but the mathematical models are complex.

Author Information

Suvrit Sra (MIT)

Suvrit Sra is a faculty member within the EECS department at MIT, where he is also a core faculty member of IDSS, LIDS, MIT-ML Group, as well as the statistics and data science center. His research spans topics in optimization, matrix theory, differential geometry, and probability theory, which he connects with machine learning --- a key focus of his research is on the theme "Optimization for Machine Learning” (

Sebastian Nowozin (Microsoft Research)
Stephen Wright (UW-Madison)

Steve Wright is a Professor of Computer Sciences at the University of Wisconsin-Madison. His research interests lie in computational optimization and its applications to science and engineering. Prior to joining UW-Madison in 2001, Wright was a Senior Computer Scientist (1997-2001) and Computer Scientist (1990-1997) at Argonne National Laboratory, and Professor of Computer Science at the University of Chicago (2000-2001). He is the past Chair of the Mathematical Optimization Society (formerly the Mathematical Programming Society), the leading professional society in optimization, and a member of the Board of the Society for Industrial and Applied Mathematics (SIAM). Wright is the author or co-author of four widely used books in numerical optimization, including "Primal Dual Interior-Point Methods" (SIAM, 1997) and "Numerical Optimization" (with J. Nocedal, Second Edition, Springer, 2006). He has also authored over 85 refereed journal papers on optimization theory, algorithms, software, and applications. He is coauthor of widely used interior-point software for linear and quadratic optimization. His recent research includes algorithms, applications, and theory for sparse optimization (including applications in compressed sensing and machine learning).

More from the Same Authors