Skip to yearly menu bar Skip to main content


Tutorial

Deep Mathematical Properties of Submodularity with Applications to Machine Learning

Jeffrey A Bilmes

Emerald Bay A

Abstract:

Submodular functions have received significant attention in the mathematics community owing to their natural and wide ranging applicability. Submodularity has a very simple definition which belies a treasure trove of consequent mathematical richness. This tutorial will attempt to convey some of this richness.

We will start by defining submodularity and polymatroidality --- we will survey a surprisingly diverse set of functions that are submodular and operations that (sometimes remarkably) preserve submodularity. Next, we'll define the submodular polytope, and its relationship to the greedy algorithm and its exact and efficient solution to certain linear programs with an exponential number of constraints. We will see how submodularity shares certain properties with convexity (efficient minimization, discrete separation, subdifferentials, lattices and sub-lattices, and the convexity of the Lovasz extension), concavity (via its definition, submodularity via concave functions, superdifferentials), and neither (simultaneous sub- and super-differentials, efficient approximate maximization). The Lovasz extension will be given particular attention due to its growing use for structured convex norms and surrogates in relaxation methods. We will survey both constrained and unconstrained submodular optimization (including the minimum norm point algorithm), discussing what is currently known about hardness (both upper and lower bounds), and also when algorithms or instances are practical.

As to applications, it is interesting that a submodular function itself can often be seen as a parameter to instantiate a machine-learning instance --- this includes active/semi-supervised learning, structured sparsity inducing norms, combinatorial independence and generalized entropy, and rank-order based divergences. Other examples include feature selection, data subset (or core set) selection, inference in graphical models with high tree-width and global potentials in computer vision, and influence determination in social networks.

Chat is not available.