Timezone: »

 
Poster
Feature Learning in $L_2$-regularized DNNs: Attraction/Repulsion and Sparsity
Arthur Jacot · Eugene Golikov · Clement Hongler · Franck Gabriel

Thu Dec 01 09:00 AM -- 11:00 AM (PST) @ Hall J #942
We study the loss surface of DNNs with $L_{2}$ regularization. Weshow that the loss in terms of the parameters can be reformulatedinto a loss in terms of the layerwise activations $Z_{\ell}$ of thetraining set. This reformulation reveals the dynamics behind featurelearning: each hidden representations $Z_{\ell}$ are optimal w.r.t.to an attraction/repulsion problem and interpolate between the inputand output representations, keeping as little information from theinput as necessary to construct the activation of the next layer.For positively homogeneous non-linearities, the loss can be furtherreformulated in terms of the covariances of the hidden representations,which takes the form of a partially convex optimization over a convexcone.This second reformulation allows us to prove a sparsity result forhomogeneous DNNs: any local minimum of the $L_{2}$-regularized losscan be achieved with at most $N(N+1)$ neurons in each hidden layer(where $N$ is the size of the training set). We show that this boundis tight by giving an example of a local minimum that requires $N^{2}/4$hidden neurons. But we also observe numerically that in more traditionalsettings much less than $N^{2}$ neurons are required to reach theminima.

Author Information

Arthur Jacot (New York University)
Eugene Golikov (École polytechnique fédérale de Lausanne)

MSc in Fluid Mechanics @ Moscow SU MSc in Computer Science @ HSE Doing a PhD in DL theory @ EPFL

Clement Hongler (EPFL)
Franck Gabriel (EPFL)

More from the Same Authors

  • 2022 Spotlight: Lightning Talks 1B-2 »
    Eugene Golikov · Nils M. Kriege · Qing Xiu · Kai Han · Greg Yang · Jing Tang · Shuang Cui · He Huang
  • 2022 Spotlight: Non-Gaussian Tensor Programs »
    Eugene Golikov · Greg Yang
  • 2022 Poster: Non-Gaussian Tensor Programs »
    Eugene Golikov · Greg Yang
  • 2020 Poster: Kernel Alignment Risk Estimator: Risk Prediction from Training Data »
    Arthur Jacot · Berfin Simsek · Francesco Spadaro · Clement Hongler · Franck Gabriel
  • 2018 : Poster Session 1 »
    Stefan Gadatsch · Danil Kuzin · Navneet Kumar · Patrick Dallaire · Tom Ryder · Remus-Petru Pop · Nathan Hunt · Adam Kortylewski · Sophie Burkhardt · Mahmoud Elnaggar · Dieterich Lawson · Yifeng Li · Jongha (Jon) Ryu · Juhan Bae · Micha Livne · Tim Pearce · Mariia Vladimirova · Jason Ramapuram · Jiaming Zeng · Xinyu Hu · Jiawei He · Danielle Maddix · Arunesh Mittal · Albert Shaw · Tuan Anh Le · Alexander Sagel · Lisha Chen · Victor Gallego · Mahdi Karami · Zihao Zhang · Tal Kachman · Noah Weber · Matt Benatan · Kumar K Sricharan · Vincent Cartillier · Ivan Ovinnikov · Buu Phan · Mahmoud Hossam · Liu Ziyin · Valerii Kharitonov · Eugene Golikov · Qiang Zhang · Jae Myung Kim · Sebastian Farquhar · Jishnu Mukhoti · Xu Hu · Gregory Gundersen · Lavanya Sita Tekumalla · Paris Perdikaris · Ershad Banijamali · Siddhartha Jain · Ge Liu · Martin Gottwald · Katy Blumer · Sukmin Yun · Ranganath Krishnan · Roman Novak · Yilun Du · Yu Gong · Beliz Gokkaya · Jessica Ai · Daniel Duckworth · Johannes von Oswald · Christian Henning · Louis-Philippe Morency · Ali Ghodsi · Mahesh Subedar · Jean-Pascal Pfister · Rémi Lebret · Chao Ma · Aleksander Wieczorek · Laurence Perreault Levasseur
  • 2018 Poster: Neural Tangent Kernel: Convergence and Generalization in Neural Networks »
    Arthur Jacot-Guillarmod · Clement Hongler · Franck Gabriel
  • 2018 Spotlight: Neural Tangent Kernel: Convergence and Generalization in Neural Networks »
    Arthur Jacot-Guillarmod · Clement Hongler · Franck Gabriel
  • 2017 : Poster Session »
    Tsz Kit Lau · Johannes Maly · Nicolas Loizou · Christian Kroer · Yuan Yao · Youngsuk Park · Reka Agnes Kovacs · Dong Yin · Vlad Zhukov · Woosang Lim · David Barmherzig · Dimitris Metaxas · Bin Shi · Rajan Udwani · William Brendel · Yi Zhou · Vladimir Braverman · Sijia Liu · Eugene Golikov