NeurIPS Poster Feature Learning in $L_2$-regularized DNNs: Attraction/Repulsion and Sparsity

Poster

Feature Learning in $L_2$ -regularized DNNs: Attraction/Repulsion and Sparsity

Arthur Jacot · Eugene Golikov · Clement Hongler · Franck Gabriel

Hall J (level 1) #942

Keywords: [ Deep Learning ] [ feature learning ] [ L2 regularization ]

[ Abstract ]

[ Paper] [ OpenReview]

Abstract: We study the loss surface of DNNs with

$L_{2}$ regularization. Weshow that the loss in terms of the parameters can be reformulatedinto a loss in terms of the layerwise activations

$Z_{\ell}$ of thetraining set. This reformulation reveals the dynamics behind featurelearning: each hidden representations

$Z_{\ell}$ are optimal w.r.t.to an attraction/repulsion problem and interpolate between the inputand output representations, keeping as little information from theinput as necessary to construct the activation of the next layer.For positively homogeneous non-linearities, the loss can be furtherreformulated in terms of the covariances of the hidden representations,which takes the form of a partially convex optimization over a convexcone.This second reformulation allows us to prove a sparsity result forhomogeneous DNNs: any local minimum of the

$L_{2}$ -regularized losscan be achieved with at most

$N(N+1)$ neurons in each hidden layer(where

$N$ is the size of the training set). We show that this boundis tight by giving an example of a local minimum that requires

$N^{2}/4$ hidden neurons. But we also observe numerically that in more traditionalsettings much less than

$N^{2}$ neurons are required to reach theminima.

Chat is not available.

Poster

Feature Learning in L2L_2-regularized DNNs: Attraction/Repulsion and Sparsity

Arthur Jacot · Eugene Golikov · Clement Hongler · Franck Gabriel

Hall J (level 1) #942

Feature Learning in $L_2$ -regularized DNNs: Attraction/Repulsion and Sparsity