Timezone: »
A large body of machine learning problems require solving nonconvex optimization. This includes deep learning, Bayesian inference, clustering, and so on. The objective functions in all these instances are highly nonconvex, and it is an open question if there are provable, polynomial time algorithms for these problems under realistic assumptions.
A diverse set of approaches have been devised to solve nonconvex problems in a variety of approaches. They range from simple local search approaches such as gradient descent and alternating minimization to more involved frameworks such as simulated annealing, continuation method, convex hierarchies, Bayesian optimization, branch and bound, and so on. Moreover, for solving special class of nonconvex problems there are efficient methods such as quasi convex optimization, star convex optimization, submodular optimization, and matrix/tensor decomposition.
There has been a burst of recent research activity in all these areas. This workshop brings researchers from these vastly different domains and hopes to create a dialogue among them. In addition to the theoretical frameworks, the workshop will also feature practitioners, especially in the area of deep learning who are developing new methodologies for training large scale neural networks. The result will be a cross fertilization of ideas from diverse areas and schools of thought.
Thu 11:15 p.m.  11:30 p.m.

Opening Remarks
(Talk)


Thu 11:30 p.m.  12:00 a.m.

Learning To Optimize
(Talk)
»
The move from handdesigned features to learned features in machine learning has been wildly successful. In spite of this, optimization algorithms are still designed by hand. In this talk I describe how the design of an optimization algorithm can be cast as a learning problem, allowing the algorithm to learn to exploit structure in the problems of interest in an automatic way. The learned algorithms, implemented by LSTMs, outperform generic, handdesigned competitors on the tasks for which they are trained, and also generalize well to new tasks with similar structure. 
Nando de Freitas 
Fri 12:00 a.m.  12:30 a.m.

Morning Poster Spotlight
(Spotlight)


Fri 12:30 a.m.  1:30 a.m.

Morning Poster Session
(Posters)


Fri 1:30 a.m.  2:00 a.m.

Coffee Break
(Break)


Fri 2:00 a.m.  2:30 a.m.

The momentLP and momentSOS approaches in optimization and some related applications
(Talk)
»
In a first part we provide an introduction to the basics of the momentLP and momentSOS approaches to global polynomial optimization. In particular, we describe the hierarchy of LP and semidefinite programs to approximate the optimal value of such problems. In fact, the same methodology also applies to solve (or approximate) Generalized Moment Problems (GMP) whose data are described by basic semialgebraic sets and polynomials (or even semialgebraic functions). Indeed, Polynomial optimization is a particular (and even the simplest) instance of the GMP. In a second part, we describe how to use this methodology for solving some problems (outside optimization) viewed as particular instances of the GMP. This includes:  Approximating compact basic semialgebraic sets defined by quantifiers.  Computing convex polynomials underestimators of polynomials on a box.  Bounds on measures satisfying some moment conditions.  Approximating the volume of compact basic semialgebraic sets.  Approximating the Gaussian measure of noncompact basic semialgebraic sets.  Approximating the Lebesgue decomposition of a measure μ w.r.t. another measure ν, based only on the moments of μ and ν. 
Jean Lasserre 
Fri 2:30 a.m.  3:00 a.m.

Nonconvexity in the error landscape and the expressive capacity of deep neural networks
(Talk)
»
A variety of recent work has studied saddle points in the error landscape of deep neural networks. A clearer understanding of these saddle points is likely to arise from an understanding of the geometry of deep functions. In particular, what do the generic functions computed by a deep network “look like?” How can we quantify and understand their geometry, and what implications does this geometry have for reducing generalization error as well as training error? We combine Riemannian geometry with the mean field theory of high dimensional chaos to study the nature of generic deep functions. Our results reveal an ordertochaos expressivity phase transition, with networks in the chaotic phase computing nonlinear functions whose global curvature grows exponentially with depth but not width. Moreover, we formalize and quantitatively demonstrate the long conjectured idea that deep networks can disentangle highly curved manifolds in input space into flat manifolds in hidden space. Our theoretical analysis of the expressive power of deep networks broadly applies to arbitrary nonlinearities, and provides intuition for why initializations at the edge of chaos can lead to both better optimization as well as superior generalization capabilities. 
Surya Ganguli 
Fri 3:00 a.m.  3:30 a.m.

Leveraging Structure in Bayesian Optimization
(Talk)
»
Bayesian optimization is an approach to nonconvex optimization that leverages a probabilistic model to make decisions about candidate points to evaluate. The primary advantage of this approach is the ability to incorporate prior knowledge about the objective function in an explicit way. While such prior information has typically been information about the smoothness of the function, many machine learning problems have additional structure that can be leveraged. I will talk about how such prior information can be found across tasks, within innerloop optimizations, and in constraints. 
Ryan Adams 
Fri 3:30 a.m.  4:30 a.m.

Lunch Break
(Break)


Fri 4:30 a.m.  5:00 a.m.

Submodular Optimization and Nonconvexity
(Talk)
»
Despite analogies of submodularity and convexity, submodular optimization is closely connected with certain "nice" nonconvex optimization problems for which theoretical guarantees are still possible. In this talk, I will review some of these connections and make them specific at the example of a challenging robust influence maximization problem, for which we obtain new, tractable formulations and algorithms. 
Stefanie Jegelka 
Fri 5:00 a.m.  5:30 a.m.

Submodular Functions: from Discrete to Continuous Domains
(Talk)
»
Submodular setfunctions have many applications in combinatorial optimization, as they can be minimized and approximately maximized in polynomial time. A key element in many of the algorithms and analyses is the possibility of extending the submodular setfunction to a convex function, which opens up tools from convex optimization. Submodularity goes beyond setfunctions and has naturally been considered for problems with multiple labels or for functions defined on continuous domains, where it corresponds essentially to cross secondderivatives being nonpositive. In this paper, we show that most results relating submodularity and convexity for setfunctions can be extended to all submodular functions. In particular, (a) we naturally define a continuous extension in a set of probability measures, (b) show that the extension is convex if and only if the original function is submodular, (c) prove that the problem of minimizing a submodular function is equivalent to a typically nonsmooth convex optimization problem, and (d) propose another convex optimization problem with better computational properties (e.g., a smooth dual problem). Most of these extensions from the setfunction situation are obtained by drawing links with the theory of multimarginal optimal transport, which provides also a new interpretation of existing results for setfunctions. We then provide practical algorithms to minimize generic submodular functions on discrete domains, with associated convergence rates. 
Francis Bach 
Fri 5:30 a.m.  6:00 a.m.

Taming nonconvexity via geometry
(Talk)
»
In this talk, I will highlight some aspects of geometry and its role in optimization. In particular, I will talk about optimization problems whose parameters are constrained to lie on a manifold or in a specific metric space. These geometric constraints often make the problems numerically challenging, but they can also unravel properties that ensure tractable attainment of global optimality for certain otherwise nonconvex problems. We'll make our foray into geometric optimization via geodesic convexity, a concept that generalizes the usual notion of convexity to nonlinear metric spaces such as Riemannian manifolds. I will outline some of our results that contribute to gconvex analysis as well as to the theory of firstorder gconvex optimization. I will mention several very interesting optimization problems where gconvexity proves remarkably useful. In closing, I will mention extensions to largescale nonconvex geometric optimization as well as key open problems. 
Suvrit Sra 
Fri 6:00 a.m.  6:30 a.m.

Break
(Coffee Break)


Fri 6:30 a.m.  7:30 a.m.

Discussion Panel


Fri 7:30 a.m.  8:00 a.m.

Afternoon Poster Spotlight
(Spotlight)


Fri 8:00 a.m.  9:00 a.m.

Afternoon Poster Session
(Posters)

Author Information
Hossein Mobahi (Google Research)
Anima Anandkumar (Caltech)
Percy Liang (Stanford University)
Stefanie Jegelka (MIT)
Anna Choromanska (NYU Tandon School of Engineering)
More from the Same Authors

2020 Poster: Enabling certification of verificationagnostic networks via memoryefficient semidefinite programming »
Sumanth Dathathri · Krishnamurthy Dvijotham · Alexey Kurakin · Aditi Raghunathan · Jonathan Uesato · Rudy Bunel · Shreya Shankar · Jacob Steinhardt · Ian Goodfellow · Percy Liang · Pushmeet Kohli 
2019 Poster: SPoC: Searchbased Pseudocode to Code »
Sumith Kulal · Panupong Pasupat · Kartik Chandra · Mina Lee · Oded Padon · Alex Aiken · Percy Liang 
2019 Poster: On the Accuracy of Influence Functions for Measuring Group Effects »
Pang Wei Koh · KaiSiang Ang · Hubert Teo · Percy Liang 
2019 Poster: Verified Uncertainty Calibration »
Ananya Kumar · Percy Liang · Tengyu Ma 
2019 Spotlight: Verified Uncertainty Calibration »
Ananya Kumar · Percy Liang · Tengyu Ma 
2018 Poster: Uncertainty Sampling is Preconditioned Stochastic Gradient Descent on ZeroOne Loss »
Stephen Mussmann · Percy Liang 
2018 Poster: Semidefinite relaxations for certifying robustness to adversarial examples »
Aditi Raghunathan · Jacob Steinhardt · Percy Liang 
2018 Poster: A RetrieveandEdit Framework for Predicting Structured Outputs »
Tatsunori Hashimoto · Kelvin Guu · Yonatan Oren · Percy Liang 
2018 Oral: A RetrieveandEdit Framework for Predicting Structured Outputs »
Tatsunori Hashimoto · Kelvin Guu · Yonatan Oren · Percy Liang 
2017 Demonstration: Babble Labble: Learning from Natural Language Explanations »
Braden Hancock · Paroma Varma · Percy Liang · Christopher Ré · Stephanie Wang 
2017 Poster: Learning Overcomplete HMMs »
Vatsal Sharan · Sham Kakade · Percy Liang · Gregory Valiant 
2017 Poster: Certified Defenses for Data Poisoning Attacks »
Jacob Steinhardt · Pang Wei Koh · Percy Liang 
2017 Poster: Unsupervised Transformation Learning via Convex Relaxations »
Tatsunori Hashimoto · Percy Liang · John Duchi 
2016 Workshop: Deep Learning for Action and Interaction »
Chelsea Finn · Raia Hadsell · David Held · Sergey Levine · Percy Liang 
2016 Workshop: Learning with Tensors: Why Now and How? »
Anima Anandkumar · Rong Ge · Yan Liu · Maximilian Nickel · Qi (Rose) Yu 
2016 Workshop: Reliable Machine Learning in the Wild »
Dylan HadfieldMenell · Adrian Weller · David Duvenaud · Jacob Steinhardt · Percy Liang 
2016 Poster: Unsupervised Risk Estimation Using Only Conditional Independence Structure »
Jacob Steinhardt · Percy Liang 
2016 Poster: Online and DifferentiallyPrivate Tensor Decomposition »
Yining Wang · Anima Anandkumar 
2015 Workshop: Nonconvex Optimization for Machine Learning: Theory and Practice »
Anima Anandkumar · Niranjan Uma Naresh · Kamalika Chaudhuri · Percy Liang · Sewoong Oh 
2015 Poster: Logarithmic Time Online Multiclass prediction »
Anna Choromanska · John Langford 
2015 Poster: Fast and Guaranteed Tensor Decomposition via Sketching »
Yining Wang · HsiaoYu Tung · Alexander Smola · Anima Anandkumar 
2015 Spotlight: Logarithmic Time Online Multiclass prediction »
Anna Choromanska · John Langford 
2015 Spotlight: Fast and Guaranteed Tensor Decomposition via Sketching »
Yining Wang · HsiaoYu Tung · Alexander Smola · Anima Anandkumar 
2015 Demonstration: CodaLab Worksheets for Reproducible, Executable Papers »
Percy Liang · Evelyne Viegas 
2015 Poster: OntheJob Learning with Bayesian Decision Theory »
Keenon Werling · Arun Tejasvi Chaganty · Percy Liang · Christopher Manning 
2015 Spotlight: OntheJob Learning with Bayesian Decision Theory »
Keenon Werling · Arun Tejasvi Chaganty · Percy Liang · Christopher Manning 
2015 Poster: Deep learning with Elastic Averaging SGD »
Sixin Zhang · Anna Choromanska · Yann LeCun 
2015 Spotlight: Deep learning with Elastic Averaging SGD »
Sixin Zhang · Anna Choromanska · Yann LeCun 
2015 Poster: Estimating Mixture Models via Mixtures of Polynomials »
Sida Wang · Arun Tejasvi Chaganty · Percy Liang 
2015 Poster: Learning with a Wasserstein Loss »
Charlie Frogner · Chiyuan Zhang · Hossein Mobahi · Mauricio Araya · Tomaso Poggio 
2015 Poster: Learning with Relaxed Supervision »
Jacob Steinhardt · Percy Liang 
2015 Poster: Calibrated Structured Prediction »
Volodymyr Kuleshov · Percy Liang 
2014 Workshop: Discrete Optimization in Machine Learning »
Jeffrey A Bilmes · Andreas Krause · Stefanie Jegelka · S Thomas McCormick · Sebastian Nowozin · Yaron Singer · Dhruv Batra · Volkan Cevher 
2014 Workshop: Challenges in Machine Learning workshop (CiML 2014) »
Isabelle Guyon · Evelyne Viegas · Percy Liang · Olga Russakovsky · Rinat Sergeev · Gábor Melis · Michele Sebag · Gustavo Stolovitzky · Jaume Bacardit · Michael S Kim · Ben Hamner 
2014 Poster: MultiStep Stochastic ADMM in High Dimensions: Applications to Sparse Optimization and Matrix Decomposition »
Hanie Sedghi · Anima Anandkumar · Edmond A Jonckheere 
2014 Poster: Altitude Training: Strong Bounds for SingleLayer Dropout »
Stefan Wager · William S Fithian · Sida Wang · Percy Liang 
2014 Poster: Parallel Double Greedy Submodular Maximization »
Xinghao Pan · Stefanie Jegelka · Joseph Gonzalez · Joseph K Bradley · Michael Jordan 
2014 Poster: Submodular meets Structured: Finding Diverse Subsets in ExponentiallyLarge Structured Item Sets »
Adarsh Prasad · Stefanie Jegelka · Dhruv Batra 
2014 Spotlight: Submodular meets Structured: Finding Diverse Subsets in ExponentiallyLarge Structured Item Sets »
Adarsh Prasad · Stefanie Jegelka · Dhruv Batra 
2014 Poster: Simple MAP Inference via LowRank Relaxations »
Roy Frostig · Sida Wang · Percy Liang · Christopher D Manning 
2014 Poster: On the Convergence Rate of Decomposable Submodular Function Minimization »
Robert Nishihara · Stefanie Jegelka · Michael Jordan 
2014 Poster: Weaklysupervised Discovery of Visual Pattern Configurations »
Hyun Oh Song · Yong Jae Lee · Stefanie Jegelka · Trevor Darrell 
2013 Workshop: Topic Models: Computation, Application, and Evaluation »
David Mimno · Amr Ahmed · Jordan BoydGraber · Ankur Moitra · Hanna Wallach · Alexander Smola · David Blei · Anima Anandkumar 
2013 Workshop: Discrete Optimization in Machine Learning: Connecting Theory and Practice »
Stefanie Jegelka · Andreas Krause · Pradeep Ravikumar · Kazuo Murota · Jeffrey A Bilmes · Yisong Yue · Michael Jordan 
2013 Poster: Dropout Training as Adaptive Regularization »
Stefan Wager · Sida Wang · Percy Liang 
2013 Poster: Optimistic Concurrency Control for Distributed Unsupervised Learning »
Xinghao Pan · Joseph Gonzalez · Stefanie Jegelka · Tamara Broderick · Michael Jordan 
2013 Spotlight: Dropout Training as Adaptive Regularization »
Stefan Wager · Sida Wang · Percy Liang 
2013 Poster: Reflection methods for userfriendly submodular optimization »
Stefanie Jegelka · Francis Bach · Suvrit Sra 
2013 Poster: When are Overcomplete Topic Models Identifiable? Uniqueness of Tensor Tucker Decompositions with Structured Sparsity »
Anima Anandkumar · Daniel Hsu · Majid Janzamin · Sham M Kakade 
2013 Poster: Curvature and Optimal Algorithms for Learning and Minimizing Submodular Functions »
Rishabh K Iyer · Stefanie Jegelka · Jeffrey A Bilmes 
2012 Workshop: Discrete Optimization in Machine Learning (DISCML): Structure and Scalability »
Stefanie Jegelka · Andreas Krause · Jeffrey A Bilmes · Pradeep Ravikumar 
2012 Poster: Learning Mixtures of Tree Graphical Models »
Anima Anandkumar · Daniel Hsu · Furong Huang · Sham M Kakade 
2012 Poster: A Spectral Algorithm for Latent Dirichlet Allocation »
Anima Anandkumar · Dean P Foster · Daniel Hsu · Sham M Kakade · YiKai Liu 
2012 Poster: Identifiability and Unmixing of Latent Parse Trees »
Percy Liang · Sham M Kakade · Daniel Hsu 
2012 Spotlight: A Spectral Algorithm for Latent Dirichlet Allocation »
Anima Anandkumar · Dean P Foster · Daniel Hsu · Sham M Kakade · YiKai Liu 
2012 Poster: Latent Graphical Model Selection: Efficient Methods for Locally Treelike Graphs »
Anima Anandkumar · Ragupathyraj Valluvan 
2011 Poster: Fast approximate submodular minimization »
Stefanie Jegelka · Hui Lin · Jeffrey A Bilmes 
2011 Poster: Spectral Methods for Learning Multivariate Latent Tree Structure »
Anima Anandkumar · Kamalika Chaudhuri · Daniel Hsu · Sham M Kakade · Le Song · Tong Zhang 
2010 Workshop: Discrete Optimization in Machine Learning: Structures, Algorithms and Applications »
Andreas Krause · Pradeep Ravikumar · Jeffrey A Bilmes · Stefanie Jegelka 
2009 Workshop: The Generative and Discriminative Learning Interface »
Simon LacosteJulien · Percy Liang · Guillaume Bouchard 
2009 Poster: Asymptotically Optimal Regularization in Smooth Parametric Models »
Percy Liang · Francis Bach · Guillaume Bouchard · Michael Jordan 
2008 Workshop: Speech and Language: Unsupervised LatentVariable Models »
Slav Petrov · Aria Haghighi · Percy Liang · Dan Klein 
2007 Poster: AgreementBased Learning »
Percy Liang · Dan Klein · Michael Jordan 
2007 Spotlight: AgreementBased Learning »
Percy Liang · Dan Klein · Michael Jordan 
2007 Poster: A Probabilistic Approach to Language Change »
Alexandre BouchardCôté · Percy Liang · Tom Griffiths · Dan Klein