Poster
Global Convergence of Gradient Descent for Deep Linear Residual Networks
Lei Wu · Qingcan Wang · Chao Ma
East Exhibition Hall B, C #201
Keywords: [ Optimization for Deep Networks ] [ Deep Learning ] [ Optimization -> Non-Convex Optimization; Theory -> Computational Complexity; Theory ] [ Learning Theory ]
[
Abstract
]
Abstract:
We analyze the global convergence of gradient descent for deep linear residual
networks by proposing a new initialization: zero-asymmetric (ZAS)
initialization. It is motivated by avoiding stable manifolds of saddle points.
We prove that under the ZAS initialization, for an arbitrary target matrix,
gradient descent converges to an -optimal point in iterations, which scales polynomially with the
network depth . Our result and the convergence time for the
standard initialization (Xavier or near-identity)
\cite{shamir2018exponential} together demonstrate the importance of the
residual structure and the initialization in the optimization for deep linear
neural networks, especially when is large.
Live content is unavailable. Log in and register to view live content