Timezone: »
Randomly initialized neural networks are known to become harder to train with increasing depth, unless architectural enhancements like residual connections and batch normalization are used. We here investigate this phenomenon by revisiting the connection between random initialization in deep networks and spectral instabilities in products of random matrices. Given the rich literature on random matrices, it is not surprising to find that the rank of the intermediate representations in unnormalized networks collapses quickly with depth. In this work we highlight the fact that batch normalization is an effective strategy to avoid rank collapse for both linear and ReLU networks. Leveraging tools from Markov chain theory, we derive a meaningful lower rank bound in deep linear networks. Empirically, we also demonstrate that this rank robustness generalizes to ReLU nets. Finally, we conduct an extensive set of experiments on real-world data sets, which confirm that rank stability is indeed a crucial condition for training modern-day deep neural architectures.
Author Information
Hadi Daneshmand (Inria)
Jonas Kohler (ETHZ)
Francis Bach (INRIA - Ecole Normale Superieure)
Thomas Hofmann (ETH Zurich)
Aurelien Lucchi (ETH Zurich)
More from the Same Authors
-
2021 Spotlight: Precise characterization of the prior predictive distribution of deep ReLU networks »
Lorenzo Noci · Gregor Bachmann · Kevin Roth · Sebastian Nowozin · Thomas Hofmann -
2021 Spotlight: Batch Normalization Orthogonalizes Representations in Deep Random Networks »
Hadi Daneshmand · Amir Joudaki · Francis Bach -
2022 Poster: A Non-asymptotic Analysis of Non-parametric Temporal-Difference Learning »
Eloïse Berthier · Ziad Kobeissi · Francis Bach -
2022 : Cosmology from Galaxy Redshift Surveys with PointNet »
Sotiris Anagnostidis · Arne Thomsen · Alexandre Refregier · Tomasz Kacprzak · Luca Biggio · Thomas Hofmann · Tilman Tröster -
2022 : Achieving a Better Stability-Plasticity Trade-off via Auxiliary Networks in Continual Learning »
Sanghwan Kim · Lorenzo Noci · Antonio Orvieto · Thomas Hofmann -
2022 Spotlight: Lightning Talks 1A-4 »
Siwei Wang · Jing Liu · Nianqiao Ju · Shiqian Li · Eloïse Berthier · Muhammad Faaiz Taufiq · Arsene Fansi Tchango · Chen Liang · Chulin Xie · Jordan Awan · Jean-Francois Ton · Ziad Kobeissi · Wenguan Wang · Xinwang Liu · Kewen Wu · Rishab Goel · Jiaxu Miao · Suyuan Liu · Julien Martel · Ruobin Gong · Francis Bach · Chi Zhang · Rob Cornish · Sanmi Koyejo · Zhi Wen · Yee Whye Teh · Yi Yang · Jiaqi Jin · Bo Li · Yixin Zhu · Vinayak Rao · Wenxuan Tu · Gaetan Marceau Caron · Arnaud Doucet · Xinzhong Zhu · Joumana Ghosn · En Zhu -
2022 Spotlight: A Non-asymptotic Analysis of Non-parametric Temporal-Difference Learning »
Eloïse Berthier · Ziad Kobeissi · Francis Bach -
2022 Poster: Variational inference via Wasserstein gradient flows »
Marc Lambert · Sinho Chewi · Francis Bach · Silvère Bonnabel · Philippe Rigollet -
2022 Poster: Asynchronous SGD Beats Minibatch SGD Under Arbitrary Delays »
Konstantin Mishchenko · Francis Bach · Mathieu Even · Blake Woodworth -
2022 Poster: On the Theoretical Properties of Noise Correlation in Stochastic Optimization »
Aurelien Lucchi · Frank Proske · Antonio Orvieto · Francis Bach · Hans Kersting -
2022 Poster: OpenFilter: A Framework to Democratize Research Access to Social Media AR Filters »
Piera Riccio · Bill Psomas · Francesco Galati · Francisco Escolano · Thomas Hofmann · Nuria Oliver -
2022 Poster: Fast Stochastic Composite Minimization and an Accelerated Frank-Wolfe Algorithm under Parallelization »
Benjamin Dubois-Taine · Francis Bach · Quentin Berthet · Adrien Taylor -
2022 Poster: Active Labeling: Streaming Stochastic Gradients »
Vivien Cabannes · Francis Bach · Vianney Perchet · Alessandro Rudi -
2021 Test Of Time: Online Learning for Latent Dirichlet Allocation »
Matthew Hoffman · Francis Bach · David Blei -
2021 Poster: Analytic Insights into Structure and Rank of Neural Network Hessian Maps »
Sidak Pal Singh · Gregor Bachmann · Thomas Hofmann -
2021 Poster: Overcoming the curse of dimensionality with Laplacian regularization in semi-supervised learning »
Vivien Cabannes · Loucas Pillaud-Vivien · Francis Bach · Alessandro Rudi -
2021 Poster: Precise characterization of the prior predictive distribution of deep ReLU networks »
Lorenzo Noci · Gregor Bachmann · Kevin Roth · Sebastian Nowozin · Thomas Hofmann -
2021 Oral: Continuized Accelerations of Deterministic and Stochastic Gradient Descents, and of Gossip Algorithms »
Mathieu Even · Raphaël Berthier · Francis Bach · Nicolas Flammarion · Hadrien Hendrikx · Pierre Gaillard · Laurent Massoulié · Adrien Taylor -
2021 Poster: Disentangling the Roles of Curation, Data-Augmentation and the Prior in the Cold Posterior Effect »
Lorenzo Noci · Kevin Roth · Gregor Bachmann · Sebastian Nowozin · Thomas Hofmann -
2021 Poster: Batch Normalization Orthogonalizes Representations in Deep Random Networks »
Hadi Daneshmand · Amir Joudaki · Francis Bach -
2021 Poster: Continuized Accelerations of Deterministic and Stochastic Gradient Descents, and of Gossip Algorithms »
Mathieu Even · Raphaël Berthier · Francis Bach · Nicolas Flammarion · Hadrien Hendrikx · Pierre Gaillard · Laurent Massoulié · Adrien Taylor -
2020 : Francis Bach - Where is Machine Learning Going? »
Francis Bach -
2020 Poster: Tight Nonparametric Convergence Rates for Stochastic Gradient Descent under the Noiseless Linear Model »
Raphaël Berthier · Francis Bach · Pierre Gaillard -
2020 Poster: Learning with Differentiable Pertubed Optimizers »
Quentin Berthet · Mathieu Blondel · Olivier Teboul · Marco Cuturi · Jean-Philippe Vert · Francis Bach -
2020 Poster: Non-parametric Models for Non-negative Functions »
Ulysse Marteau-Ferey · Francis Bach · Alessandro Rudi -
2020 Spotlight: Non-parametric Models for Non-negative Functions »
Ulysse Marteau-Ferey · Francis Bach · Alessandro Rudi -
2020 Session: Orals & Spotlights Track 30: Optimization/Theory »
Yuxin Chen · Francis Bach -
2020 Poster: Dual-Free Stochastic Decentralized Optimization with Variance Reduction »
Hadrien Hendrikx · Francis Bach · Laurent Massoulié -
2020 Poster: Adversarial Training is a Form of Data-dependent Operator Norm Regularization »
Kevin Roth · Yannic Kilcher · Thomas Hofmann -
2020 Spotlight: Adversarial Training is a Form of Data-dependent Operator Norm Regularization »
Kevin Roth · Yannic Kilcher · Thomas Hofmann -
2020 Poster: Convolutional Generation of Textured 3D Meshes »
Dario Pavllo · Graham Spinks · Thomas Hofmann · Marie-Francine Moens · Aurelien Lucchi -
2020 Oral: Convolutional Generation of Textured 3D Meshes »
Dario Pavllo · Graham Spinks · Thomas Hofmann · Marie-Francine Moens · Aurelien Lucchi -
2019 : Spotlight talks »
Paul Grigas · Zhewei Yao · Aurelien Lucchi · Si Yi Meng -
2019 Poster: Shadowing Properties of Optimization Algorithms »
Antonio Orvieto · Aurelien Lucchi -
2019 Poster: Continuous-time Models for Stochastic Optimization Algorithms »
Antonio Orvieto · Aurelien Lucchi -
2019 Poster: A Domain Agnostic Measure for Monitoring and Evaluating GANs »
Paulina Grnarova · Kfir Y. Levy · Aurelien Lucchi · Nathanael Perraudin · Ian Goodfellow · Thomas Hofmann · Andreas Krause -
2019 Poster: Fast Decomposable Submodular Function Minimization using Constrained Total Variation »
Senanayak Sesh Kumar Karri · Francis Bach · Thomas Pock -
2019 Poster: Towards closing the gap between the theory and practice of SVRG »
Othmane Sebbouh · Nidham Gazagnadou · Samy Jelassi · Francis Bach · Robert Gower -
2019 Poster: An Accelerated Decentralized Stochastic Proximal Algorithm for Finite Sums »
Hadrien Hendrikx · Francis Bach · Laurent Massoulié -
2019 Poster: On Lazy Training in Differentiable Programming »
Lénaïc Chizat · Edouard Oyallon · Francis Bach -
2019 Poster: Implicit Regularization of Discrete Gradient Dynamics in Linear Neural Networks »
Gauthier Gidel · Francis Bach · Simon Lacoste-Julien -
2019 Poster: Massively scalable Sinkhorn distances via the Nyström method »
Jason Altschuler · Francis Bach · Alessandro Rudi · Jonathan Niles-Weed -
2019 Poster: Localized Structured Prediction »
Carlo Ciliberto · Francis Bach · Alessandro Rudi -
2019 Poster: UniXGrad: A Universal, Adaptive Algorithm with Optimal Guarantees for Constrained Optimization »
Ali Kavis · Kfir Y. Levy · Francis Bach · Volkan Cevher -
2019 Spotlight: UniXGrad: A Universal, Adaptive Algorithm with Optimal Guarantees for Constrained Optimization »
Ali Kavis · Kfir Y. Levy · Francis Bach · Volkan Cevher -
2019 Poster: Partially Encrypted Deep Learning using Functional Encryption »
Théo Ryffel · David Pointcheval · Francis Bach · Edouard Dufour-Sans · Romain Gay -
2019 Poster: Globally Convergent Newton Methods for Ill-conditioned Generalized Self-concordant Losses »
Ulysse Marteau-Ferey · Francis Bach · Alessandro Rudi -
2018 Poster: Optimal Algorithms for Non-Smooth Distributed Optimization in Networks »
Kevin Scaman · Francis Bach · Sebastien Bubeck · Laurent Massoulié · Yin Tat Lee -
2018 Poster: Hyperbolic Neural Networks »
Octavian Ganea · Gary Becigneul · Thomas Hofmann -
2018 Poster: Statistical Optimality of Stochastic Gradient Descent on Hard Learning Problems through Multiple Passes »
Loucas Pillaud-Vivien · Alessandro Rudi · Francis Bach -
2018 Oral: Optimal Algorithms for Non-Smooth Distributed Optimization in Networks »
Kevin Scaman · Francis Bach · Sebastien Bubeck · Laurent Massoulié · Yin Tat Lee -
2018 Spotlight: Hyperbolic Neural Networks »
Octavian Ganea · Gary Becigneul · Thomas Hofmann -
2018 Poster: Relating Leverage Scores and Density using Regularized Christoffel Functions »
Edouard Pauwels · Francis Bach · Jean-Philippe Vert -
2018 Poster: Deep State Space Models for Unconditional Word Generation »
Florian Schmidt · Thomas Hofmann -
2018 Poster: Efficient Algorithms for Non-convex Isotonic Regression through Submodular Optimization »
Francis Bach -
2018 Poster: Rest-Katyusha: Exploiting the Solution's Structure via Scheduled Restart Schemes »
Junqi Tang · Mohammad Golbabaee · Francis Bach · Mike Davies -
2018 Poster: SING: Symbol-to-Instrument Neural Generator »
Alexandre Defossez · Neil Zeghidour · Nicolas Usunier · Leon Bottou · Francis Bach -
2018 Poster: On the Global Convergence of Gradient Descent for Over-parameterized Models using Optimal Transport »
Lénaïc Chizat · Francis Bach -
2017 : Concluding remarks »
Francis Bach · Benjamin Guedj · Pascal Germain -
2017 : Neil Lawrence, Francis Bach and François Laviolette »
Neil Lawrence · Francis Bach · Francois Laviolette -
2017 : Sharp asymptotic and finite-sample rates of convergence of empirical measures in Wasserstein distance »
Francis Bach -
2017 : Overture »
Benjamin Guedj · Francis Bach · Pascal Germain -
2017 Workshop: (Almost) 50 shades of Bayesian Learning: PAC-Bayesian trends and insights »
Benjamin Guedj · Pascal Germain · Francis Bach -
2017 Poster: On Structured Prediction Theory with Calibrated Convex Surrogate Losses »
Anton Osokin · Francis Bach · Simon Lacoste-Julien -
2017 Oral: On Structured Prediction Theory with Calibrated Convex Surrogate Losses »
Anton Osokin · Francis Bach · Simon Lacoste-Julien -
2017 Poster: Nonlinear Acceleration of Stochastic Algorithms »
Damien Scieur · Francis Bach · Alexandre d'Aspremont -
2017 Poster: Integration Methods and Optimization Algorithms »
Damien Scieur · Vincent Roulet · Francis Bach · Alexandre d'Aspremont -
2017 Poster: Stabilizing Training of Generative Adversarial Networks through Regularization »
Kevin Roth · Aurelien Lucchi · Sebastian Nowozin · Thomas Hofmann -
2016 : Francis Bach. Harder, Better, Faster, Stronger Convergence Rates for Least-Squares Regression. »
Francis Bach -
2016 Workshop: OPT 2016: Optimization for Machine Learning »
Suvrit Sra · Francis Bach · Sashank J. Reddi · Niao He -
2016 : Submodular Functions: from Discrete to Continuous Domains »
Francis Bach -
2016 Workshop: Learning in High Dimensions with Structure »
Nikhil Rao · Prateek Jain · Hsiang-Fu Yu · Ming Yuan · Francis Bach -
2016 Poster: Parameter Learning for Log-supermodular Distributions »
Tatiana Shpakova · Francis Bach -
2016 Poster: Regularized Nonlinear Acceleration »
Damien Scieur · Alexandre d'Aspremont · Francis Bach -
2016 Oral: Regularized Nonlinear Acceleration »
Damien Scieur · Alexandre d'Aspremont · Francis Bach -
2016 Poster: Stochastic Variance Reduction Methods for Saddle-Point Problems »
Balamurugan Palaniappan · Francis Bach -
2016 Poster: Adaptive Newton Method for Empirical Risk Minimization to Statistical Accuracy »
Aryan Mokhtari · Hadi Daneshmand · Aurelien Lucchi · Thomas Hofmann · Alejandro Ribeiro -
2016 Poster: PAC-Bayesian Theory Meets Bayesian Inference »
Pascal Germain · Francis Bach · Alexandre Lacoste · Simon Lacoste-Julien -
2016 Poster: Stochastic Optimization for Large-scale Optimal Transport »
Aude Genevay · Marco Cuturi · Gabriel Peyré · Francis Bach -
2016 Tutorial: Large-Scale Optimization: Beyond Stochastic Gradient Descent and Convexity »
Suvrit Sra · Francis Bach -
2015 : Structured Sparsity and convex optimization »
Francis Bach -
2015 : Sharp Analysis of Random Feature Expansions »
Francis Bach -
2015 : Convergence Rates of Kernel Quadrature Rules »
Francis Bach -
2015 Poster: Variance Reduced Stochastic Gradient Descent with Neighbors »
Thomas Hofmann · Aurelien Lucchi · Simon Lacoste-Julien · Brian McWilliams -
2015 Poster: Rethinking LDA: Moment Matching for Discrete ICA »
Anastasia Podosinnikova · Francis Bach · Simon Lacoste-Julien -
2015 Poster: Spectral Norm Regularization of Orthonormal Representations for Graph Transduction »
Rakesh Shivanna · Bibaswan K Chatterjee · Raman Sankaran · Chiranjib Bhattacharyya · Francis Bach -
2014 Poster: Communication-Efficient Distributed Dual Coordinate Ascent »
Martin Jaggi · Virginia Smith · Martin Takac · Jonathan Terhorst · Sanjay Krishnan · Thomas Hofmann · Michael Jordan -
2014 Poster: Metric Learning for Temporal Sequence Alignment »
Rémi Lajugie · Damien Garreau · Francis Bach · Sylvain Arlot -
2014 Poster: SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives »
Aaron Defazio · Francis Bach · Simon Lacoste-Julien -
2013 Poster: Non-strongly-convex smooth stochastic approximation with convergence rate O(1/n) »
Francis Bach · Eric Moulines -
2013 Spotlight: Non-strongly-convex smooth stochastic approximation with convergence rate O(1/n) »
Francis Bach · Eric Moulines -
2013 Session: Oral Session 2 »
Francis Bach -
2013 Poster: Convex Relaxations for Permutation Problems »
Fajwel Fogel · Rodolphe Jenatton · Francis Bach · Alexandre d'Aspremont -
2013 Poster: Reflection methods for user-friendly submodular optimization »
Stefanie Jegelka · Francis Bach · Suvrit Sra -
2013 Session: Tutorial Session B »
Francis Bach -
2012 Workshop: Analysis Operator Learning vs. Dictionary Learning: Fraternal Twins in Sparse Modeling »
Martin Kleinsteuber · Francis Bach · Remi Gribonval · John Wright · Simon Hawe -
2012 Poster: Multiple Operator-valued Kernel Learning »
Hachem Kadri · Alain Rakotomamonjy · Francis Bach · philippe preux -
2012 Poster: A Stochastic Gradient Method with an Exponential Convergence
Rate for Finite Training Sets »
Nicolas Le Roux · Mark Schmidt · Francis Bach -
2012 Oral: A Stochastic Gradient Method with an Exponential Convergence
Rate for Finite Training Sets »
Nicolas Le Roux · Mark Schmidt · Francis Bach -
2011 Workshop: Sparse Representation and Low-rank Approximation »
Ameet S Talwalkar · Lester W Mackey · Mehryar Mohri · Michael W Mahoney · Francis Bach · Mike Davies · Remi Gribonval · Guillaume R Obozinski -
2011 Poster: Convergence Rates of Inexact Proximal-Gradient Methods for Convex Optimization »
Mark Schmidt · Nicolas Le Roux · Francis Bach -
2011 Oral: Convergence Rates of Inexact Proximal-Gradient Methods for Convex Optimization »
Mark Schmidt · Nicolas Le Roux · Francis Bach -
2011 Poster: Non-Asymptotic Analysis of Stochastic Approximation Algorithms for Machine Learning »
Francis Bach · Eric Moulines -
2011 Poster: Trace Lasso: a trace norm regularization for correlated designs »
Edouard Grave · Guillaume R Obozinski · Francis Bach -
2011 Spotlight: Non-Asymptotic Analysis of Stochastic Approximation Algorithms for Machine Learning »
Francis Bach · Eric Moulines -
2011 Poster: Shaping Level Sets with Submodular Functions »
Francis Bach -
2010 Workshop: New Directions in Multiple Kernel Learning »
Marius Kloft · Ulrich Rueckert · Cheng Soon Ong · Alain Rakotomamonjy · Soeren Sonnenburg · Francis Bach -
2010 Spotlight: Online Learning for Latent Dirichlet Allocation »
Matthew D. Hoffman · David Blei · Francis Bach -
2010 Poster: Efficient Optimization for Discriminative Latent Class Models »
Armand Joulin · Francis Bach · Jean A Ponce -
2010 Poster: Online Learning for Latent Dirichlet Allocation »
Matthew D. Hoffman · David Blei · Francis Bach -
2010 Oral: Structured sparsity-inducing norms through submodular functions »
Francis Bach -
2010 Poster: Structured sparsity-inducing norms through submodular functions »
Francis Bach -
2010 Poster: Network Flow Algorithms for Structured Sparsity »
Julien Mairal · Rodolphe Jenatton · Guillaume R Obozinski · Francis Bach -
2009 Workshop: Understanding Multiple Kernel Learning Methods »
Brian McFee · Gert Lanckriet · Francis Bach · Nati Srebro -
2009 Poster: Data-driven calibration of linear estimators with minimal penalties »
Sylvain Arlot · Francis Bach -
2009 Poster: Asymptotically Optimal Regularization in Smooth Parametric Models »
Percy Liang · Francis Bach · Guillaume Bouchard · Michael Jordan -
2009 Tutorial: Sparse Methods for Machine Learning: Theory and Algorithms »
Francis Bach -
2008 Poster: Clustered Multi-Task Learning: A Convex Formulation »
Laurent Jacob · Francis Bach · Jean-Philippe Vert -
2008 Poster: Sparse probabilistic projections »
Cedric Archambeau · Francis Bach -
2008 Spotlight: Sparse probabilistic projections »
Cedric Archambeau · Francis Bach -
2008 Spotlight: Clustered Multi-Task Learning: A Convex Formulation »
Laurent Jacob · Francis Bach · Jean-Philippe Vert -
2008 Poster: Exploring Large Feature Spaces with Hierarchical Multiple Kernel Learning »
Francis Bach -
2008 Poster: Kernel Change-point Analysis »
Zaid Harchaoui · Francis Bach · Eric Moulines -
2008 Poster: SDL: Supervised Dictionary Learning »
Julien Mairal · Francis Bach · Jean A Ponce · Guillermo Sapiro · Andrew Zisserman -
2007 Poster: Testing for Homogeneity with Kernel Fisher Discriminant Analysis »
Zaid Harchaoui · Francis Bach · Moulines Eric -
2007 Poster: DIFFRAC: a discriminative and flexible framework for clustering »
Francis Bach · Zaid Harchaoui -
2007 Session: Session 2: Probabilistic Optimization »
Francis Bach -
2006 Poster: Active learning for misspecified generalized linear models »
Francis Bach