Workshop

Non-convex Optimization for Machine Learning: Theory and Practice

Anima Anandkumar · Niranjan Uma Naresh · Kamalika Chaudhuri · Percy Liang · Sewoong Oh

Project Page [ Youtube Video - Anandkumar] [ Youtube Video - Arora] [ Youtube Video - Chen] [ Youtube Video - Huang] [ Youtube Video - LeCun] [ Youtube Video - Mobahi] [ Youtube Video - Ré]

Abstract

Non-convex optimization is ubiquitous in machine learning. In general, reaching the global optima of these problems is NP-hard and in practice, local search methods such as gradient descent can get stuck in spurious local optima and suffer from poor convergence.

Over the last few years, tremendous progress has been made in establishing theoretical guarantees for many of the non-convex optimization problems. While there are worst-case instances which are computationally hard to solve, focus has shifted in characterizing transparent conditions for cases which are tractable. In many instances, these conditions turn out to be mild and natural for machine learning applications.

One area of non-convex optimization which has attracted extensive interest is spectral learning. This involves finding spectral decomposition of matrices and tensors which correspond to moments of a multivariate distribution. These algorithms are guaranteed to recover a consistent solution to parameter estimation problem in many latent variable models such as topic admixture models, HMMs, ICA, and most recently, even non-linear models such as neural networks. In contrast to traditional algorithms like expectation maximization (EM), these algorithms come with polynomial computational and sample complexity guarantees. Analysis of these methods involves understanding the optimization landscape for tensor algebraic structures.

As another example of guaranteed non-convex methods, there has been interest in the problem of dictionary learning, which involves expressing the observed data as a sparse combination of dictionary elements. Recent results have established that both the dictionary and the coefficients can be consistently recovered in the challenging overcomplete case, where the number of dictionary elements can exceed the input dimensionality.

There is also interest in analyzing online algorithms for non-convex methods. A recent work has established that the simple stochastic gradient descent (SGD) with appropriately added noise can escape the saddle points and converge to a local optimum in bounded time for a large class of nonconvex problems. This is especially important since non-convex problems usually suffer from an exponential number of saddle points.

Finally, recent years have also seen novel applications of non-convex methods with rigorous guarantees. For example, many of these methods have shown great promise in diverse application domains such as natural language processing, social networks, health informatics, and biological sequence analysis.

There are certainly many challenging open problems in the area of non-convex optimization. While guarantees have been established in individual instances, there is no common unifying theme of what makes non-convex problem tractable. Many challenging instances such as optimization for training multi-layer neural networks or analyzing novel regularization techniques such as dropout for non-convex optimization still remain wide open. On the practical side, conversations between theorists and practitioners can help identify what kind of conditions are reasonable for specific applications, and thus lead to the design of practically motivated algorithms for non-convex optimization with rigorous guarantees.

This workshop will fill a very important gap in bringing researchers from disparate communities and bridging the gap between theoreticians and practitioners. To facilitate discussion between theorists and practitioners, we aim to make the workshop easily accessible to people currently unfamiliar with the intricate details of these methods. We plan to have an open problems session and a discussion session to spur further research in this area. There will also be invited poster session from top active student researchers in the area to increase quality participation in the workshop.

Video

Chat is not available.

Schedule

Timezone: America/Los_Angeles

5:40 AM

Opening and Overview

Anima Anandkumar

6:10 AM

Large-Scale Optimization for Deep Learning

Yann LeCun

7:30 AM

When Your Big Data Seems Too Small

Gregory Valiant

8:00 AM

Convolutional Dictionary Learning through Tensor Factorization

Furong Huang

11:30 AM

Provable algorithms for non convex optimization

Sanjeev Arora

12:00 PM

Non convex Optimization by Complexity Progression

Hossein Mobahi

12:30 PM

Computably Feasible Greedy Algorithms for Neural Nets

12:30 PM

Computably Feasible Greedy Algorithms for Neural Nets --- SPEAKER NOT REGISTERED

1:30 PM

Taking it Easy

Christopher Ré

2:00 PM

Spectral Algorithms for Learning HMMs and Tree HHMs for Epigenetics Data

Kevin Chen