Skip to yearly menu bar Skip to main content


Poster

Provable Acceleration of Nesterov's Accelerated Gradient for Asymmetric Matrix Factorization and Linear Neural Networks

Zhenghao Xu · Yuqing Wang · Tuo Zhao · Rachel Ward · Molei Tao

West Ballroom A-D #6109
[ ]
[ Paper [ Slides [ Poster [ OpenReview
Thu 12 Dec 11 a.m. PST — 2 p.m. PST

Abstract: We study the convergence rate of first-order methods for rectangular matrix factorization, which is a canonical nonconvex optimization problem. Specifically, given a rank-r matrix ARm×n, we prove that gradient descent (GD) can find a pair of ϵ-optimal solutions XTRm×d and YTRn×d, where dr, satisfying XTYTAFϵAF in T=O(κ2log1ϵ) iterations with high probability, where κ denotes the condition number of A. Furthermore, we prove that Nesterov's accelerated gradient (NAG) attains an iteration complexity of O(κlog1ϵ), which is the best-known bound of first-order methods for rectangular matrix factorization. Different from small balanced random initialization in the existing literature, we adopt an unbalanced initialization, where X0 is large and Y0 is 0. Moreover, our initialization and analysis can be further extended to linear neural networks, where we prove that NAG can also attain an accelerated linear convergence rate. In particular, we only require the width of the network to be greater than or equal to the rank of the output label matrix. In contrast, previous results achieving the same rate require excessive widths that additionally depend on the condition number and the rank of the input data matrix.

Chat is not available.