NeurIPS 2023 Smoothing the Landscape Boosts the Signal for SGD: Optimal Sample Complexity for Learning Single Index Models Oral

Oral

Smoothing the Landscape Boosts the Signal for SGD: Optimal Sample Complexity for Learning Single Index Models

Alex Damian · Eshaan Nichani · Rong Ge · Jason Lee

Hall C2 (level 1 gate 9 south of food court)

[ Abstract ] [ Visit Oral 4A Optimization ]

[ OpenReview]

Abstract: We focus on the task of learning a single index model

σ (w^{⋆} \cdot x)

$\sigma(w^\star \cdot x)$ with respect to the isotropic Gaussian distribution in

d

$d$ dimensions. Prior work has shown that the sample complexity of learning

w^{⋆}

$w^\star$ is governed by the information exponent

k^{⋆}

$k^\star$ of the link function

σ

$\sigma$ , which is defined as the index of the first nonzero Hermite coefficient of

σ

$\sigma$ . Ben Arous et al. (2021) showed that

n ≳ d^{k^{⋆} - 1}

$n \gtrsim d^{k^\star-1}$ samples suffice for learning

w^{⋆}

$w^\star$ and that this is tight for online SGD. However, the CSQ lower bound for gradient based methods only shows that

n ≳ d^{k^{⋆} / 2}

$n \gtrsim d^{k^\star/2}$ samples are necessary. In this work, we close the gap between the upper and lower bounds by showing that online SGD on a smoothed loss learns

w^{⋆}

$w^\star$ with

n ≳ d^{k^{⋆} / 2}

$n \gtrsim d^{k^\star/2}$ samples. We also draw connections to statistical analyses of tensor PCA and to the implicit regularization effects of minibatch SGD on empirical losses.

Chat is not available.