NeurIPS Poster Separation and Bias of Deep Equilibrium Models on Expressivity and Learning Dynamics

Poster

Separation and Bias of Deep Equilibrium Models on Expressivity and Learning Dynamics

Zhoutong Wu · Yimu Zhang · Cong Fang · Zhouchen Lin

West Ballroom A-D #5709

[ Abstract ]

[ Paper] [ Poster] [ OpenReview]

Wed 11 Dec 4:30 p.m. PST — 7:30 p.m. PST

Abstract: The deep equilibrium model (DEQ) generalizes the conventional feedforward neural network by fixing the same weights for each layer block and extending the number of layers to infinity. This novel model directly finds the fixed points of such a forward process as features for prediction. Despite empirical evidence showcasing its efficacy compared to feedforward neural networks, a theoretical understanding for its separation and bias is still limited. In this paper, we take a stepby proposing some separations and studying the bias of DEQ in its expressive power and learning dynamics. The results include: (1) A general separation is proposed, showing the existence of a width-

m

$m$ DEQ that any fully connected neural networks (FNNs) with depth

O (m^{α})

$O(m^{\alpha})$ for

α \in (0, 1)

$\alpha \in (0,1)$ cannotapproximate unless its width is sub-exponential in

m

$m$ ; (2) DEQ with polynomially bounded size and magnitude can efficiently approximate certain steep functions (which has very large derivatives) in

L^{\infty}

$L^{\infty}$ norm, whereas FNN with bounded depth and exponentially bounded width cannot unless its weights magnitudes are exponentially large; (3) The implicit regularization caused by gradient flow from a diagonal linear DEQ is characterized, with specific examples showing the benefits brought by such regularization. From the overall study, a high-level conjecture from our analysis and empirical validations is that DEQ has potential advantages in learning certain high-frequency components.

Chat is not available.