Timezone: »
Poster
AdaFocal: Calibration-aware Adaptive Focal Loss
Arindam Ghosh · Thomas Schaaf · Matthew Gormley
Much recent work has been devoted to the problem of ensuring that a neural network's confidence scores match the true probability of being correct, i.e. the calibration problem. Of note, it was found that training with focal loss leads to better calibration than cross-entropy while achieving similar level of accuracy \cite{mukhoti2020}. This success stems from focal loss regularizing the entropy of the model's prediction (controlled by the parameter $\gamma$), thereby reining in the model's overconfidence. Further improvement is expected if $\gamma$ is selected independently for each training sample (Sample-Dependent Focal Loss (FLSD-53) \cite{mukhoti2020}). However, FLSD-53 is based on heuristics and does not generalize well. In this paper, we propose a calibration-aware adaptive focal loss called AdaFocal that utilizes the calibration properties of focal (and inverse-focal) loss and adaptively modifies $\gamma_t$ for different groups of samples based on $\gamma_{t-1}$ from the previous step and the knowledge of model's under/over-confidence on the validation set. We evaluate AdaFocal on various image recognition and one NLP task, covering a wide variety of network architectures, to confirm the improvement in calibration while achieving similar levels of accuracy. Additionally, we show that models trained with AdaFocal achieve a significant boost in out-of-distribution detection.
Author Information
Arindam Ghosh (3M Healthcare Info. Systems Carnegie Mellon University)
I currently work as a Senior Research Engineer in the Speech and NLU group at 3M Healthcare. I graduated with a Masters degree in Electrical and Computer Engineering from Carnegie Mellon University focused on machine learning, speech recognition, and NLP. Prior to this I graduated with a bachelors degree in Electronics and Communication Engineering from NIT Durgapur, India and then worked as a research engineer in the field of wireless communication at Centre for Development of Telematics (C-DOT), Bangalore, India.
Thomas Schaaf (3M | M*Modal)
Matthew Gormley (Carnegie Mellon University)
More from the Same Authors
-
2023 Poster: Unlimiformer: Long-Range Transformers with Unlimited Length Input »
Amanda Bertsch · Uri Alon · Graham Neubig · Matthew Gormley -
2019 Poster: Towards modular and programmable architecture search »
Renato Negrinho · Matthew Gormley · Geoffrey Gordon · Darshan Patil · Nghia Le · Daniel Ferreira -
2018 Poster: Learning Beam Search Policies via Imitation Learning »
Renato Negrinho · Matthew Gormley · Geoffrey Gordon