NIPS Poster Generalization of ERM in Stochastic Convex Optimization: The Dimension Strikes Back

Poster

Generalization of ERM in Stochastic Convex Optimization: The Dimension Strikes Back

Vitaly Feldman

Area 5+6+7+8 #187

Keywords: [ Learning Theory ] [ Convex Optimization ]

[ Abstract ]

Abstract: In stochastic convex optimization the goal is to minimize a convex function

F (x) ≐ \E_{f \sim D} [f (x)]

$F(x) \doteq \E_{f\sim D}[f(x)]$ over a convex set

\K \subset \R^{d}

$\K \subset \R^d$ where

D

$D$ is some unknown distribution and each

f (\cdot)

$f(\cdot)$ in the support of

D

$D$ is convex over

\K

$\K$ . The optimization is based on i.i.d.~samples

f^{1}, f^{2}, \dots, f^{n}

$f^1,f^2,\ldots,f^n$ from

D

$D$ . A common approach to such problems is empirical risk minimization (ERM) that optimizes

F_{S} (x) ≐ \frac{1}{n} \sum_{i \leq n} f^{i} (x)

$F_S(x) \doteq \frac{1}{n}\sum_{i\leq n} f^i(x)$ . Here we consider the question of how many samples are necessary for ERM to succeed and the closely related question of uniform convergence of

F_{S}

$F_S$ to

F

$F$ over

\K

$\K$ . We demonstrate that in the standard

ℓ_{p} / ℓ_{q}

$\ell_p/\ell_q$ setting of Lipschitz-bounded functions over a

\K

$\K$ of bounded radius, ERM requires sample size that scales linearly with the dimension

d

$d$ . This nearly matches standard upper bounds and improves on

Ω (\log d)

$\Omega(\log d)$ dependence proved for

ℓ_{2} / ℓ_{2}

$\ell_2/\ell_2$ setting in (Shalev-Shwartz et al. 2009). In stark contrast, these problems can be solved using dimension-independent number of samples for

ℓ_{2} / ℓ_{2}

$\ell_2/\ell_2$ setting and

\log d

$\log d$ dependence for

ℓ_{1} / ℓ_{\infty}

$\ell_1/\ell_\infty$ setting using other approaches. We also demonstrate that for a more general class of range-bounded (but not Lipschitz-bounded) stochastic convex programs an even stronger gap appears already in dimension 2.

Live content is unavailable. Log in and register to view live content