Track: Algorithms, Optimization

Tue 5 Dec. 14:50 - 15:05 PST

Oral

Streaming Weak Submodularity: Interpreting Neural Networks on the Fly

Ethan Elenberg · Alex Dimakis · Moran Feldman · Amin Karbasi

In many machine learning applications, it is important to explain the predictions of a black-box classifier. For example, why does a deep neural network assign an image to a particular class? We cast interpretability of black-box classifiers as a combinatorial maximization problem and propose an efficient streaming algorithm to solve it subject to cardinality constraints. By extending ideas from Badanidiyuru et al. [2014], we provide a constant factor approximation guarantee for our algorithm in the case of random stream order and a weakly submodular objective function. This is the first such theoretical guarantee for this general class of functions, and we also show that no such algorithm exists for a worst case stream order. Our algorithm obtains similar explanations of Inception V3 predictions 10 times faster than the state-of-the-art LIME framework of Ribeiro et al. [2016].

Tue 5 Dec. 15:05 - 15:20 PST

Oral

A unified approach to interpreting model predictions

Scott M Lundberg · Su-In Lee

Understanding why a model made a certain prediction is crucial in many applications. However, with large modern datasets the best accuracy is often achieved by complex models even experts struggle to interpret, such as ensemble or deep learning models. This creates a tension between accuracy and interpretability. In response, a variety of methods have recently been proposed to help users interpret the predictions of complex models. Here, we present a unified framework for interpreting predictions, namely SHAP (SHapley Additive exPlanations), which assigns each feature an importance for a particular prediction. The key components of the SHAP framework are the identification of a class of additive feature importance measures and theoretical results that there is a unique solution in this class with a set of desired properties. This class unifies six existing methods, and several recent methods in this class do not have these desired properties. This means that our framework can inform the development of new methods for explaining prediction models. We demonstrate that several new methods we presented in this paper based on the SHAP framework show better computational performance and better consistency with human intuition than existing methods.

Tue 5 Dec. 15:20 - 15:25 PST

Spotlight

Differentiable Learning of Submodular Functions

Josip Djolonga · Andreas Krause

Can we incorporate discrete optimization algorithms within modern machine learning models? For example, is it possible to use in deep architectures a layer whose output is the minimal cut of a parametrized graph? Given that these models are trained end-to-end by leveraging gradient information, the introduction of such layers seems very challenging due to their non-continuous output. In this paper we focus on the problem of submodular minimization, for which we show that such layers are indeed possible. The key idea is that we can continuously relax the output without sacrificing guarantees. We provide an easily computable approximation to the Jacobian complemented with a complete theoretical analysis. Finally, these contributions let us experimentally learn probabilistic log-supermodular models via a bi-level variational inference formulation.

Tue 5 Dec. 15:25 - 15:30 PST

Spotlight

Generalized Linear Model Regression under Distance-to-set Penalties

Jason Xu · Eric Chi · Kenneth Lange

Estimation in generalized linear models (GLM) is complicated by the presence of constraints. One can handle constraints by maximizing a penalized log-likelihood. Penalties such as the lasso are effective in high dimensions but often lead to severe shrinkage. This paper explores instead penalizing the squared distance to constraint sets. Distance penalties are more flexible than algebraic and regularization penalties, and avoid the drawback of shrinkage. To optimize distance penalized objectives, we make use of the majorization-minimization principle. Resulting algorithms constructed within this framework are amenable to acceleration and come with global convergence guarantees. Applications to shape constraints, sparse regression, and rank-restricted matrix regression on synthetic and real data showcase the strong empirical performance of distance penalization, even under non-convex constraints.

Tue 5 Dec. 15:30 - 15:35 PST

Spotlight

Decomposable Submodular Function Minimization: Discrete and Continuous

Alina Ene · Huy Nguyen · László A. Végh

This paper investigates connections between discrete and continuous approaches for decomposable submodular function minimization. We provide improved running time estimates for the state-of-the-art continuous algorithms for the problem using combinatorial arguments. We also provide a systematic experimental comparison of the two types of methods, based on a clear distinction between level-0 and level-1 algorithms.

Tue 5 Dec. 15:35 - 15:40 PST

Spotlight

Unbiased estimates for linear regression via volume sampling

Michal Derezinski · Manfred K. Warmuth

Given a full rank matrix X with more columns than rows consider the task of estimating the pseudo inverse $X^+$ based on the pseudo inverse of a sampled subset of columns (of size at least the number of rows). We show that this is possible if the subset of columns is chosen proportional to the squared volume spanned by the rows of the chosen submatrix (ie, volume sampling). The resulting estimator is unbiased and surprisingly the covariance of the estimator also has a closed form: It equals a specific factor times $X^+X^{+\top}$. Pseudo inverse plays an important part in solving the linear least squares problem, where we try to predict a label for each column of $X$. We assume labels are expensive and we are only given the labels for the small subset of columns we sample from $X$. Using our methods we show that the weight vector of the solution for the sub problem is an unbiased estimator of the optimal solution for the whole problem based on all column labels. We believe that these new formulas establish a fundamental connection between linear least squares and volume sampling. We use our methods to obtain an algorithm for volume sampling that is faster than state-of-the-art and for obtaining bounds for the total loss of the estimated least-squares solution on all labeled columns.

Tue 5 Dec. 15:40 - 15:45 PST

Spotlight

On Frank-Wolfe and Equilibrium Computation

Jacob D Abernethy · Jun-Kun Wang

We consider the Frank-Wolfe method for constrained convex optimization, a first-order projection-free procedure. We show that this algorithm can be recast in a different light, emerging as a special case of a particular meta-algorithm for computing equilibria (saddle points) of convex-concave zero-sum games. This equilibrium computation trick relies on the existence of no-regret online learning to both generate a sequence of iterates but also to provide a proof of convergence through vanishing regret. We show that our stated equivalence has several nice properties, particularly that it exhibits a modularity that gives rise to various old and new algorithms. We explore a few such resulting methods, and provide experimental results to demonstrate correctness and efficiency.

Tue 5 Dec. 15:45 - 15:50 PST

Spotlight

On Separability of Loss Functions, and Revisiting Discriminative Vs Generative Models

Adarsh Prasad · Alexandru Niculescu-Mizil · Pradeep Ravikumar

We revisit the classical analysis of generative vs discriminative models for general exponential families, and high-dimensional settings. Towards this, we develop novel technical machinery, including a notion of separability of general loss functions, which allow us to provide a general framework to obtain l∞ convergence rates for general M-estimators. We use this machinery to analyze l∞ and l2 convergence rates of generative and discriminative models, and provide insights into their nuanced behaviors in high-dimensions. Our results are also applicable to differential parameter estimation, where the quantity of interest is the difference between generative model parameters.

Main Navigation

Session

Algorithms, Optimization

Streaming Weak Submodularity: Interpreting Neural Networks on the Fly

A unified approach to interpreting model predictions

Differentiable Learning of Submodular Functions

Generalized Linear Model Regression under Distance-to-set Penalties

Decomposable Submodular Function Minimization: Discrete and Continuous

Unbiased estimates for linear regression via volume sampling

On Frank-Wolfe and Equilibrium Computation

On Separability of Loss Functions, and Revisiting Discriminative Vs Generative Models