Timezone: »

Nonconvex Optimization for Machine Learning: Theory and Practice
Hossein Mobahi · Anima Anandkumar · Percy Liang · Stefanie Jegelka · Anna Choromanska

Thu Dec 08 11:00 PM -- 09:30 AM (PST) @ Area 5 + 6
Event URL: https://sites.google.com/site/nonconvexnips2016/ »

A large body of machine learning problems require solving nonconvex optimization. This includes deep learning, Bayesian inference, clustering, and so on. The objective functions in all these instances are highly non-convex, and it is an open question if there are provable, polynomial time algorithms for these problems under realistic assumptions.

A diverse set of approaches have been devised to solve nonconvex problems in a variety of approaches. They range from simple local search approaches such as gradient descent and alternating minimization to more involved frameworks such as simulated annealing, continuation method, convex hierarchies, Bayesian optimization, branch and bound, and so on. Moreover, for solving special class of nonconvex problems there are efficient methods such as quasi convex optimization, star convex optimization, submodular optimization, and matrix/tensor decomposition.

There has been a burst of recent research activity in all these areas. This workshop brings researchers from these vastly different domains and hopes to create a dialogue among them. In addition to the theoretical frameworks, the workshop will also feature practitioners, especially in the area of deep learning who are developing new methodologies for training large scale neural networks. The result will be a cross fertilization of ideas from diverse areas and schools of thought.

Thu 11:15 p.m. - 11:30 p.m.
Opening Remarks (Talk)
Thu 11:30 p.m. - 12:00 a.m.

The move from hand-designed features to learned features in machine learning has been wildly successful. In spite of this, optimization algorithms are still designed by hand. In this talk I describe how the design of an optimization algorithm can be cast as a learning problem, allowing the algorithm to learn to exploit structure in the problems of interest in an automatic way. The learned algorithms, implemented by LSTMs, outperform generic, hand-designed competitors on the tasks for which they are trained, and also generalize well to new tasks with similar structure.

Nando de Freitas
Fri 12:00 a.m. - 12:30 a.m.
Morning Poster Spotlight (Spotlight)
Fri 12:30 a.m. - 1:30 a.m.
Morning Poster Session (Posters)
Fri 1:30 a.m. - 2:00 a.m.
Coffee Break (Break)
Fri 2:00 a.m. - 2:30 a.m.

In a first part we provide an introduction to the basics of the moment-LP and moment-SOS approaches to global polynomial optimization. In particular, we describe the hierarchy of LP and semidefinite programs to approximate the optimal value of such problems. In fact, the same methodology also applies to solve (or approximate) Generalized Moment Problems (GMP) whose data are described by basic semi-algebraic sets and polynomials (or even semi-algebraic functions). Indeed, Polynomial optimization is a particular (and even the simplest) instance of the GMP.

In a second part, we describe how to use this methodology for solving some problems (outside optimization) viewed as particular instances of the GMP. This includes: - Approximating compact basic semi-algebraic sets defined by quantifiers. - Computing convex polynomials underestimators of polynomials on a box. - Bounds on measures satisfying some moment conditions. - Approximating the volume of compact basic semi-algebraic sets. - Approximating the Gaussian measure of non-compact basic semi-algebraic sets. - Approximating the Lebesgue decomposition of a measure μ w.r.t. another measure ν, based only on the moments of μ and ν.

Jean Lasserre
Fri 2:30 a.m. - 3:00 a.m.

A variety of recent work has studied saddle points in the error landscape of deep neural networks. A clearer understanding of these saddle points is likely to arise from an understanding of the geometry of deep functions. In particular, what do the generic functions computed by a deep network “look like?” How can we quantify and understand their geometry, and what implications does this geometry have for reducing generalization error as well as training error? We combine Riemannian geometry with the mean field theory of high dimensional chaos to study the nature of generic deep functions. Our results reveal an order-to-chaos expressivity phase transition, with networks in the chaotic phase computing nonlinear functions whose global curvature grows exponentially with depth but not width. Moreover, we formalize and quantitatively demonstrate the long conjectured idea that deep networks can disentangle highly curved manifolds in input space into flat manifolds in hidden space. Our theoretical analysis of the expressive power of deep networks broadly applies to arbitrary nonlinearities, and provides intuition for why initializations at the edge of chaos can lead to both better optimization as well as superior generalization capabilities.

Surya Ganguli
Fri 3:00 a.m. - 3:30 a.m.

Bayesian optimization is an approach to non-convex optimization that leverages a probabilistic model to make decisions about candidate points to evaluate. The primary advantage of this approach is the ability to incorporate prior knowledge about the objective function in an explicit way. While such prior information has typically been information about the smoothness of the function, many machine learning problems have additional structure that can be leveraged. I will talk about how such prior information can be found across tasks, within inner-loop optimizations, and in constraints.

Ryan Adams
Fri 3:30 a.m. - 4:30 a.m.
Lunch Break (Break)
Fri 4:30 a.m. - 5:00 a.m.

Despite analogies of submodularity and convexity, submodular optimization is closely connected with certain "nice" non-convex optimization problems for which theoretical guarantees are still possible. In this talk, I will review some of these connections and make them specific at the example of a challenging robust influence maximization problem, for which we obtain new, tractable formulations and algorithms.

Stefanie Jegelka
Fri 5:00 a.m. - 5:30 a.m.

Submodular set-functions have many applications in combinatorial optimization, as they can be minimized and approximately maximized in polynomial time. A key element in many of the algorithms and analyses is the possibility of extending the submodular set-function to a convex function, which opens up tools from convex optimization. Submodularity goes beyond set-functions and has naturally been considered for problems with multiple labels or for functions defined on continuous domains, where it corresponds essentially to cross second-derivatives being nonpositive. In this paper, we show that most results relating submodularity and convexity for set-functions can be extended to all submodular functions. In particular, (a) we naturally define a continuous extension in a set of probability measures, (b) show that the extension is convex if and only if the original function is submodular, (c) prove that the problem of minimizing a submodular function is equivalent to a typically non-smooth convex optimization problem, and (d) propose another convex optimization problem with better computational properties (e.g., a smooth dual problem). Most of these extensions from the set-function situation are obtained by drawing links with the theory of multi-marginal optimal transport, which provides also a new interpretation of existing results for set-functions. We then provide practical algorithms to minimize generic submodular functions on discrete domains, with associated convergence rates.

Francis Bach
Fri 5:30 a.m. - 6:00 a.m.

In this talk, I will highlight some aspects of geometry and its role in optimization. In particular, I will talk about optimization problems whose parameters are constrained to lie on a manifold or in a specific metric space. These geometric constraints often make the problems numerically challenging, but they can also unravel properties that ensure tractable attainment of global optimality for certain otherwise non-convex problems.

We'll make our foray into geometric optimization via geodesic convexity, a concept that generalizes the usual notion of convexity to nonlinear metric spaces such as Riemannian manifolds. I will outline some of our results that contribute to g-convex analysis as well as to the theory of first-order g-convex optimization. I will mention several very interesting optimization problems where g-convexity proves remarkably useful. In closing, I will mention extensions to large-scale non-convex geometric optimization as well as key open problems.

Suvrit Sra
Fri 6:00 a.m. - 6:30 a.m.
Break (Coffee Break)
Fri 6:30 a.m. - 7:30 a.m.
Discussion Panel
Fri 7:30 a.m. - 8:00 a.m.
Afternoon Poster Spotlight (Spotlight)
Fri 8:00 a.m. - 9:00 a.m.
Afternoon Poster Session (Posters)

Author Information

Hossein Mobahi (Google Research)
Anima Anandkumar (Caltech)
Percy Liang (Stanford University)
Stefanie Jegelka (MIT)
Anna Choromanska (NYU Tandon School of Engineering)

More from the Same Authors