Track: Spotlight Session 1

Minimum Weight Perfect Matching via Blossom Belief Propagation

Sungsoo Ahn · Sejun Park · Michael Chertkov · Jinwoo Shin

Max-product Belief Propagation (BP) is a popular message-passing algorithm for computing a Maximum-A-Posteriori (MAP) assignment over a distribution represented by a Graphical Model (GM). It has been shown that BP can solve a number of combinatorial optimization problems including minimum weight matching, shortest path, network flow and vertex cover under the following common assumption: the respective Linear Programming (LP) relaxation is tight, i.e., no integrality gap is present. However, when LP shows an integrality gap, no model has been known which can be solved systematically via sequential applications of BP. In this paper, we develop the first such algorithm, coined Blossom-BP, for solving the minimum weight matching problem over arbitrary graphs. Each step of the sequential algorithm requires applying BP over a modified graph constructed by contractions and expansions of blossoms, i.e., odd sets of vertices. Our scheme guarantees termination in O(n^2) of BP runs, where n is the number of vertices in the original graph. In essence, the Blossom-BP offers a distributed version of the celebrated Edmonds' Blossom algorithm by jumping at once over many sub-steps with a single BP. Moreover, our result provides an interpretation of the Edmonds' algorithm as a sequence of LPs.

Super-Resolution Off the Grid

Qingqing Huang · Sham Kakade

Super-resolution is the problem of recovering a superposition of point sources using bandlimited measurements, which may be corrupted with noise. This signal processing problem arises in numerous imaging problems, ranging from astronomy to biology to spectroscopy, where it is common to take (coarse) Fourier measurements of an object. Of particular interest is in obtaining estimation procedures which are robust to noise, with the following desirable statistical and computational properties: we seek to use coarse Fourier measurements (bounded by some \emph{cutoff frequency}); we hope to take a (quantifiably) small number of measurements; we desire our algorithm to run quickly. Suppose we have $k$ point sources in $d$ dimensions, where the points are separated by at least $\Delta$ from each other (in Euclidean distance). This work provides an algorithm with the following favorable guarantees:1. The algorithm uses Fourier measurements, whose frequencies are bounded by $O(1/\Delta)$ (up to log factors). Previous algorithms require a \emph{cutoff frequency} which may be as large as $\Omega(\sqrt{d}/\Delta)$.2. The number of measurements taken by and the computational complexity of our algorithm are bounded by a polynomial in both the number of points $k$ and the dimension $d$, with \emph{no} dependence on the separation $\Delta$. In contrast, previous algorithms depended inverse polynomially on the minimal separation and exponentially on the dimension for both of these quantities.Our estimation procedure itself is simple: we take random bandlimited measurements (as opposed to taking an exponential number of measurements on the hyper-grid). Furthermore, our analysis and algorithm are elementary (based on concentration bounds of sampling and singular value decomposition).

b-bit Marginal Regression

Martin Slawski · Ping Li

We consider the problem of sparse signal recovery from $m$ linear measurements quantized to $b$ bits. $b$-bit Marginal Regression is proposed as recovery algorithm. We study the question of choosing $b$ in the setting of a given budget of bits $B = m \cdot b$ and derive a single easy-to-compute expression characterizing the trade-off between $m$ and $b$. The choice $b = 1$ turns out to be optimal for estimating the unit vector corresponding to the signal for any level of additive Gaussian noise before quantization as well as for adversarial noise. For $b \geq 2$, we show that Lloyd-Max quantization constitutes an optimal quantization scheme and that the norm of the signal canbe estimated consistently by maximum likelihood.

LASSO with Non-linear Measurements is Equivalent to One With Linear Measurements

CHRISTOS THRAMPOULIDIS · Ehsan Abbasi · Babak Hassibi

Consider estimating an unknown, but structured (e.g. sparse, low-rank, etc.), signal $x_0\in R^n$ from a vector $y\in R^m$ of measurements of the form $y_i=g_i(a_i^Tx_0)$, where the $a_i$'s are the rows of a known measurement matrix $A$, and, $g$ is a (potentially unknown) nonlinear and random link-function. Such measurement functions could arise in applications where the measurement device has nonlinearities and uncertainties. It could also arise by design, e.g., $g_i(x)=sign(x+z_i)$, corresponds to noisy 1-bit quantized measurements. Motivated by the classical work of Brillinger, and more recent work of Plan and Vershynin, we estimate $x_0$ via solving the Generalized-LASSO, i.e., $\hat x=\arg\min_{x}\|y-Ax_0\|_2+\lambda f(x)$ for some regularization parameter $\lambda >0$ and some (typically non-smooth) convex regularizer $f$ that promotes the structure of $x_0$, e.g. $\ell_1$-norm, nuclear-norm. While this approach seems to naively ignore the nonlinear function $g$, both Brillinger and Plan and Vershynin have shown that, when the entries of $A$ are iid standard normal, this is a good estimator of $x_0$ up to a constant of proportionality $\mu$, which only depends on $g$. In this work, we considerably strengthen these results by obtaining explicit expressions for $\|\hat x-\mu x_0\|_2$, for the regularized Generalized-LASSO, that are asymptotically precise when $m$ and $n$ grow large. A main result is that the estimation performance of the Generalized LASSO with non-linear measurements is asymptotically the same as one whose measurements are linear $y_i=\mu a_i^Tx_0+\sigma z_i$, with $\mu=E[\gamma g(\gamma)]$ and $\sigma^2=E[(g(\gamma)-\mu\gamma)^2]$, and, $\gamma$ standard normal. The derived expressions on the estimation performance are the first-known precise results in this context. One interesting consequence of our result is that the optimal quantizer of the measurements that minimizes the estimation error of the LASSO is the celebrated Lloyd-Max quantizer.

Optimal Rates for Random Fourier Features

Bharath Sriperumbudur · Zoltan Szabo

Kernel methods represent one of the most powerful tools in machine learning to tackle problems expressed in terms of function values and derivatives due to their capability to represent and model complex relations. While these methods show good versatility, they are computationally intensive and have poor scalability to large data as they require operations on Gram matrices. In order to mitigate this serious computational limitation, recently randomized constructions have been proposed in the literature, which allow the application of fast linear algorithms. Random Fourier features (RFF) are among the most popular and widely applied constructions: they provide an easily computable, low-dimensional feature representation for shift-invariant kernels. Despite the popularity of RFFs, very little is understood theoretically about their approximation quality. In this paper, we provide a detailed finite-sample theoretical analysis about the approximation quality of RFFs by (i) establishing optimal (in terms of the RFF dimension, and growing set size) performance guarantees in uniform norm, and (ii) presenting guarantees in L^r (1 ≤ r < ∞) norms. We also propose an RFF approximation to derivatives of a kernel with a theoretical study on its approximation quality.

Submodular Hamming Metrics

Jennifer Gillenwater · Rishabh K Iyer · Bethany Lusch · Rahul Kidambi · Jeffrey A Bilmes

We show that there is a largely unexplored class of functions (positive polymatroids) that can define proper discrete metrics over pairs of binary vectors and that are fairly tractable to optimize over. By exploiting submodularity, we are able to give hardness results and approximation algorithms for optimizing over such metrics. Additionally, we demonstrate empirically the effectiveness of these metrics and associated algorithms on both a metric minimization task (a form of clustering) and also a metric maximization task (generating diverse k-best lists).

Top-k Multiclass SVM

Maksim Lapin · Matthias Hein · Bernt Schiele

Class ambiguity is typical in image classification problems with a large number of classes. When classes are difficult to discriminate, it makes sense to allow k guesses and evaluate classifiers based on the top-k error instead of the standard zero-one loss. We propose top-k multiclass SVM as a direct method to optimize for top-k performance. Our generalization of the well-known multiclass SVM is based on a tight convex upper bound of the top-k error. We propose a fast optimization scheme based on an efficient projection onto the top-k simplex, which is of its own interest. Experiments on five datasets show consistent improvements in top-k accuracy compared to various baselines.