Timezone: »
We provide the first global optimization landscape analysis of Neural Collapse -- an intriguing empirical phenomenon that arises in the last-layer classifiers and features of neural networks during the terminal phase of training. As recently reported by Papyan et al., this phenomenon implies that (i) the class means and the last-layer classifiers all collapse to the vertices of a Simplex Equiangular Tight Frame (ETF) up to scaling, and (ii) cross-example within-class variability of last-layer activations collapses to zero. We study the problem based on a simplified unconstrained feature model, which isolates the topmost layers from the classifier of the neural network. In this context, we show that the classical cross-entropy loss with weight decay has a benign global landscape, in the sense that the only global minimizers are the Simplex ETFs while all other critical points are strict saddles whose Hessian exhibit negative curvature directions. Our analysis of the simplified model not only explains what kind of features are learned in the last layer, but also shows why they can be efficiently optimized, matching the empirical observations in practical deep network architectures. These findings provide important practical implications. As an example, our experiments demonstrate that one may set the feature dimension equal to the number of classes and fix the last-layer classifier to be a Simplex ETF for network training, which reduces memory cost by over 20% on ResNet18 without sacrificing the generalization performance. The source code is available at https://github.com/tding1/Neural-Collapse.
Author Information
Zhihui Zhu (University of Denver)
Tianyu Ding (Johns Hopkins University)
Jinxin Zhou (University of Denver)
Xiao Li (University of Michigan)
Chong You (University of California, Berkeley)
Jeremias Sulam (Johns Hopkins University)
Qing Qu (University of Michigan)
Related Events (a corresponding poster, oral, or spotlight)
-
2021 Poster: A Geometric Analysis of Neural Collapse with Unconstrained Features »
Thu. Dec 9th 12:30 -- 02:00 AM Room
More from the Same Authors
-
2022 : DeepSTI: Towards Tensor Reconstruction using Fewer Orientations in Susceptibility Tensor Imaging »
Zhenghan Fang · Kuo-Wei Lai · Peter van Zijl · Xu Li · Jeremias Sulam -
2022 Poster: Recovery and Generalization in Over-Realized Dictionary Learning »
Jeremias Sulam · Chong You · Zhihui Zhu -
2022 Poster: Neural Collapse with Normalized Features: A Geometric Analysis over the Riemannian Manifold »
Can Yaras · Peng Wang · Zhihui Zhu · Laura Balzano · Qing Qu -
2022 Poster: Are All Losses Created Equal: A Neural Collapse Perspective »
Jinxin Zhou · Chong You · Xiao Li · Kangning Liu · Sheng Liu · Qing Qu · Zhihui Zhu -
2022 Poster: Error Analysis of Tensor-Train Cross Approximation »
Zhen Qin · Alexander Lidiak · Zhexuan Gong · Gongguo Tang · Michael B Wakin · Zhihui Zhu -
2022 Poster: Revisiting Sparse Convolutional Model for Visual Recognition »
xili dai · Mingyang Li · Pengyuan Zhai · Shengbang Tong · Xingjian Gao · Shao-Lun Huang · Zhihui Zhu · Chong You · Yi Ma -
2021 Poster: Only Train Once: A One-Shot Neural Network Training And Pruning Framework »
Tianyi Chen · Bo Ji · Tianyu Ding · Biyi Fang · Guanyi Wang · Zhihui Zhu · Luming Liang · Yixin Shi · Sheng Yi · Xiao Tu -
2021 Poster: Rank Overspecified Robust Matrix Recovery: Subgradient Method and Exact Recovery »
Lijun Ding · Liwei Jiang · Yudong Chen · Qing Qu · Zhihui Zhu -
2021 Poster: Convolutional Normalization: Improving Deep Convolutional Network Robustness and Training »
Sheng Liu · Xiao Li · Simon Zhai · Chong You · Zhihui Zhu · Carlos Fernandez-Granda · Qing Qu -
2020 Poster: Learning to solve TV regularised problems with unrolled algorithms »
Hamza Cherkaoui · Jeremias Sulam · Thomas Moreau -
2020 Poster: Conformal Symplectic and Relativistic Optimization »
Guilherme Franca · Jeremias Sulam · Daniel Robinson · Rene Vidal -
2020 Spotlight: Conformal Symplectic and Relativistic Optimization »
Guilherme Franca · Jeremias Sulam · Daniel Robinson · Rene Vidal -
2020 Poster: Adversarial Robustness of Supervised Sparse Coding »
Jeremias Sulam · Ramchandran Muthukumar · Raman Arora -
2020 Poster: Robust Recovery via Implicit Bias of Discrepant Learning Rates for Double Over-parameterization »
Chong You · Zhihui Zhu · Qing Qu · Yi Ma -
2020 Spotlight: Robust Recovery via Implicit Bias of Discrepant Learning Rates for Double Over-parameterization »
Chong You · Zhihui Zhu · Qing Qu · Yi Ma -
2020 Poster: Learning Diverse and Discriminative Representations via the Principle of Maximal Coding Rate Reduction »
Yaodong Yu · Kwan Ho Ryan Chan · Chong You · Chaobing Song · Yi Ma -
2019 Poster: Distributed Low-rank Matrix Factorization With Exact Consensus »
Zhihui Zhu · Qiuwei Li · Xinshuo Yang · Gongguo Tang · Michael B Wakin -
2019 Poster: A Nonconvex Approach for Exact and Efficient Multichannel Sparse Blind Deconvolution »
Qing Qu · Xiao Li · Zhihui Zhu -
2019 Spotlight: A Nonconvex Approach for Exact and Efficient Multichannel Sparse Blind Deconvolution »
Qing Qu · Xiao Li · Zhihui Zhu -
2019 Poster: A Linearly Convergent Method for Non-Smooth Non-Convex Optimization on the Grassmannian with Applications to Robust Subspace and Dictionary Learning »
Zhihui Zhu · Tianyu Ding · Daniel Robinson · Manolis Tsakiris · RenĂ© Vidal -
2018 Poster: Dual Principal Component Pursuit: Improved Analysis and Efficient Algorithms »
Zhihui Zhu · Yifan Wang · Daniel Robinson · Daniel Naiman · RenĂ© Vidal · Manolis Tsakiris -
2018 Poster: Dropping Symmetry for Fast Symmetric Nonnegative Matrix Factorization »
Zhihui Zhu · Xiao Li · Kai Liu · Qiuwei Li