Poster

Toward Equation of Motion for Deep Neural Networks: Continuous-time Gradient Descent and Discretization Error Analysis

Taiki Miyagawa

Hall J #930

Keywords: [ Equation of motion ] [ Discretization error ] [ Gradient flow ] [ Gradient Descent ]

[ Abstract ]
[ Poster [ OpenReview
Thu 1 Dec 9 a.m. PST — 11 a.m. PST
 
Spotlight presentation: Lightning Talks 4A-2
Wed 7 Dec 5:30 p.m. PST — 5:45 p.m. PST

Abstract:

We derive and solve an ``Equation of Motion'' (EoM) for deep neural networks (DNNs), a differential equation that precisely describes the discrete learning dynamics of DNNs. Differential equations are continuous but have played a prominent role even in the study of discrete optimization (gradient descent (GD) algorithms). However, there still exist gaps between differential equations and the actual learning dynamics of DNNs due to discretization error. In this paper, we start from gradient flow (GF) and derive a counter term that cancels the discretization error between GF and GD. As a result, we obtain EoM, a continuous differential equation that precisely describes the discrete learning dynamics of GD. We also derive discretization error to show to what extent EoM is precise. In addition, we apply EoM to two specific cases: scale- and translation-invariant layers. EoM highlights differences between continuous and discrete GD, indicating the importance of the counter term for a better description of the discrete learning dynamics of GD. Our experimental results support our theoretical findings.

Chat is not available.