Skip to yearly menu bar Skip to main content


Contributed talks
in
Workshop: OPT 2023: Optimization for Machine Learning

Contributed Talks 1: *Escaping mediocrity: how two-layer networks learn hard generalized linear models* and *Last Iterate Convergence of Popov Method for Non-monotone Stochastic Variational Inequalities*

Bruno Loureiro · Daniil Vankov · Courtney Paquette

[ ]
Fri 15 Dec 7:30 a.m. PST — 8 a.m. PST

Abstract:

Two 15 minute talks:

  1. Escaping mediocrity: how two-layer networks learn hard generalized linear models, Bruno Loureiro
  • This study explores the sample complexity for two-layer neural networks to learn a generalized linear target function under Stochastic Gradient Descent (SGD), focusing on the challenging regime where many flat directions are present at initialization. It is well-established that in this scenario samples are typically needed. However, we provide precise results concerning the pre-factors in high-dimensional contexts and for varying widths. Notably, our findings suggest that overparameterization can only enhance convergence by a constant factor within this problem class. These insights are grounded in the reduction of SGD dynamics to a stochastic process in lower dimensions, where escaping mediocrity equates to calculating an exit time. Yet, we demonstrate that a deterministic approximation of this process adequately represents the escape time, implying that the role of stochasticity may be minimal in this scenario.
  1. Last Iterate Convergence of Popov Method for Non-monotone Stochastic Variational Inequalities, Daniil Vankov
  • This paper focuses on non-monotone stochastic variational inequalities (SVIs) that may not have a unique solution. A commonly used efficient algorithm to solve VIs is the Popov method, which is known to have the optimal convergence rate for VIs with Lipschitz continuous and strongly monotone operators. We introduce a broader class of structured non-monotone operators, namely p-quasi-sharp operators, which allows tractably analyzing convergence behavior of algorithms. We show that the stochastic Popov method converges \emph{almost surely} to a solution for all operators from this class under a linear growth. In addition, we obtain the last iterate convergence rate (in expectation) for the method under a linear growth condition for p-quasi-sharp operators. Based on our analysis, we refine the results for smooth p-quasi-sharp and p-quasi-sharp operators (on a compact set), and obtain the optimal convergence rates.

Chat is not available.