Timezone: »
While cross entropy (CE) is the most commonly used loss function to train deep neural networks for classification tasks, many alternative losses have been developed to obtain better empirical performance. Among them, which one is the best to use is still a mystery, because there seem to be multiple factors affecting the answer, such as properties of the dataset, the choice of network architecture, and so on. This paper studies the choice of loss function by examining the last-layer features of deep networks, drawing inspiration from a recent line work showing that the global optimal solution of CE and mean-square-error (MSE) losses exhibits a Neural Collapse phenomenon. That is, for sufficiently large networks trained until convergence, (i) all features of the same class collapse to the corresponding class mean and (ii) the means associated with different classes are in a configuration where their pairwise distances are all equal and maximized. We extend such results and show through global solution and landscape analyses that a broad family of loss functions including commonly used label smoothing (LS) and focal loss (FL) exhibits Neural Collapse. Hence, all relevant losses (i.e., CE, LS, FL, MSE) produce equivalent features on training data. In particular, based on the unconstrained feature model assumption, we provide either the global landscape analysis for LS loss or the local landscape analysis for FL loss and show that the (only!) global minimizers are neural collapse solutions, while all other critical points are strict saddles whose Hessian exhibit negative curvature directions either in the global scope for LS loss or in the local scope for FL loss near the optimal solution. The experiments further show that Neural Collapse features obtained from all relevant losses (i.e., CE, LS, FL, MSE) lead to largely identical performance on test data as well, provided that the network is sufficiently large and trained until convergence.
Author Information
Jinxin Zhou (Ohio State University, Columbus)
Chong You (University of California, Berkeley)
Xiao Li (University of Michigan)
Kangning Liu (New York University)
Sheng Liu (NYU)
Qing Qu (University of Michigan)
Zhihui Zhu (The Ohio State University)
More from the Same Authors
-
2021 Spotlight: A Geometric Analysis of Neural Collapse with Unconstrained Features »
Zhihui Zhu · Tianyu Ding · Jinxin Zhou · Xiao Li · Chong You · Jeremias Sulam · Qing Qu -
2022 : Hidden State Variability of Pretrained Language Models Can Guide Computation Reduction for Transfer Learning »
Shuo Xie · Jiahao Qiu · Ankita Pasad · Li Du · Qing Qu · Hongyuan Mei -
2022 : Linear Convergence Analysis of Neural Collapse with Unconstrained Features »
Peng Wang · Huikang Liu · Can Yaras · Laura Balzano · Qing Qu -
2022 Poster: Neural Collapse with Normalized Features: A Geometric Analysis over the Riemannian Manifold »
Can Yaras · Peng Wang · Zhihui Zhu · Laura Balzano · Qing Qu -
2022 Poster: StrokeRehab: A Benchmark Dataset for Sub-second Action Identification »
Aakash Kaku · Kangning Liu · Avinash Parnandi · Haresh Rengaraj Rajamohan · Kannan Venkataramanan · Anita Venkatesan · Audre Wirtanen · Natasha Pandit · Heidi Schambra · Carlos Fernandez-Granda -
2022 Poster: Error Analysis of Tensor-Train Cross Approximation »
Zhen Qin · Alexander Lidiak · Zhexuan Gong · Gongguo Tang · Michael B Wakin · Zhihui Zhu -
2022 Poster: Revisiting Sparse Convolutional Model for Visual Recognition »
xili dai · Mingyang Li · Pengyuan Zhai · Shengbang Tong · Xingjian Gao · Shao-Lun Huang · Zhihui Zhu · Chong You · Yi Ma -
2021 Poster: A Geometric Analysis of Neural Collapse with Unconstrained Features »
Zhihui Zhu · Tianyu Ding · Jinxin Zhou · Xiao Li · Chong You · Jeremias Sulam · Qing Qu -
2021 Poster: Only Train Once: A One-Shot Neural Network Training And Pruning Framework »
Tianyi Chen · Bo Ji · Tianyu Ding · Biyi Fang · Guanyi Wang · Zhihui Zhu · Luming Liang · Yixin Shi · Sheng Yi · Xiao Tu -
2021 Poster: Rank Overspecified Robust Matrix Recovery: Subgradient Method and Exact Recovery »
Lijun Ding · Liwei Jiang · Yudong Chen · Qing Qu · Zhihui Zhu -
2021 Poster: Convolutional Normalization: Improving Deep Convolutional Network Robustness and Training »
Sheng Liu · Xiao Li · Simon Zhai · Chong You · Zhihui Zhu · Carlos Fernandez-Granda · Qing Qu -
2020 Poster: Early-Learning Regularization Prevents Memorization of Noisy Labels »
Sheng Liu · Jonathan Niles-Weed · Narges Razavian · Carlos Fernandez-Granda -
2020 Poster: Robust Recovery via Implicit Bias of Discrepant Learning Rates for Double Over-parameterization »
Chong You · Zhihui Zhu · Qing Qu · Yi Ma -
2020 Spotlight: Robust Recovery via Implicit Bias of Discrepant Learning Rates for Double Over-parameterization »
Chong You · Zhihui Zhu · Qing Qu · Yi Ma -
2020 Poster: Learning Diverse and Discriminative Representations via the Principle of Maximal Coding Rate Reduction »
Yaodong Yu · Kwan Ho Ryan Chan · Chong You · Chaobing Song · Yi Ma -
2019 Poster: Distributed Low-rank Matrix Factorization With Exact Consensus »
Zhihui Zhu · Qiuwei Li · Xinshuo Yang · Gongguo Tang · Michael B Wakin -
2019 Poster: A Nonconvex Approach for Exact and Efficient Multichannel Sparse Blind Deconvolution »
Qing Qu · Xiao Li · Zhihui Zhu -
2019 Spotlight: A Nonconvex Approach for Exact and Efficient Multichannel Sparse Blind Deconvolution »
Qing Qu · Xiao Li · Zhihui Zhu -
2019 Poster: A Linearly Convergent Method for Non-Smooth Non-Convex Optimization on the Grassmannian with Applications to Robust Subspace and Dictionary Learning »
Zhihui Zhu · Tianyu Ding · Daniel Robinson · Manolis Tsakiris · RenĂ© Vidal -
2018 Poster: Dual Principal Component Pursuit: Improved Analysis and Efficient Algorithms »
Zhihui Zhu · Yifan Wang · Daniel Robinson · Daniel Naiman · RenĂ© Vidal · Manolis Tsakiris -
2018 Poster: Dropping Symmetry for Fast Symmetric Nonnegative Matrix Factorization »
Zhihui Zhu · Xiao Li · Kai Liu · Qiuwei Li