NeurIPS Poster Decomposable Non-Smooth Convex Optimization with Nearly-Linear Gradient Oracle Complexity

Poster

Decomposable Non-Smooth Convex Optimization with Nearly-Linear Gradient Oracle Complexity

Sally Dong · Haotian Jiang · Yin Tat Lee · Swati Padmanabhan · Guanghao Ye

Hall J (level 1) #829

Keywords: [ gradient oracle complexity ] [ non-smooth convex optimization ] [ decomposable ] [ submodular function minimization ]

[ Abstract ]

[ Paper] [ OpenReview]

Abstract: Many fundamental problems in machine learning can be formulated by the convex program

min θ \in R^{d} n \sum i = 1 f_{i} (θ),

$\min_{\theta\in \mathbb{R}^d}\ \sum_{i=1}^{n}f_{i}(\theta),$ where each

f_{i}

$f_i$ is a convex, Lipschitz function supported on a subset of

d_{i}

$d_i$ coordinates of

θ

$\theta$ . One common approach to this problem, exemplified by stochastic gradient descent, involves sampling one

f_{i}

$f_i$ term at every iteration to make progress. This approach crucially relies on a notion of uniformity across the

f_{i}

$f_i$ 's, formally captured by their condition number. In this work, we give an algorithm that minimizes the above convex formulation to

ϵ

$\epsilon$ -accuracy in

˜ O (\sum_{i = 1}^{n} d_{i} log (1 / ϵ))

$\widetilde{O}(\sum_{i=1}^n d_i \log (1 /\epsilon))$ gradient computations, with no assumptions on the condition number. The previous best algorithm independent of the condition number is the standard cutting plane method, which requires

O (n d log (1 / ϵ))

$O(nd \log (1/\epsilon))$ gradient computations. As a corollary, we improve upon the evaluation oracle complexity for decomposable submodular minimization by [Axiotis, Karczmarz, Mukherjee, Sankowski and Vladu, ICML 2021]. Our main technical contribution is an adaptive procedure to select an

f_{i}

$f_i$ term at every iteration via a novel combination of cutting-plane and interior-point methods.

Chat is not available.