Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Workshop on Distribution Shifts: New Frontiers with Foundation Models

On selective classification under distribution shift

Luís Felipe Cattelan · Danilo Silva

Keywords: [ Deep Learning ] [ Uncertainty Estimation ] [ reject option ] [ failure prediction ] [ misclassification detection ] [ selective classification ] [ Neural Networks ] [ Distribution Shift ]


Abstract: This paper addresses the problem of selective classification for deep neural networks, where a model is allowed to abstain from low-confidence predictions to avoid potential errors. Specifically, we investigate whether the selective classification performance of ImageNet classifiers is robust to distribution shift. Motivated by the intriguing observation in recent work that many classifiers appear to have a ``broken'' confidence estimator, we start by evaluating methods to fix this issue. We focus on so-called post-hoc methods, which replace the confidence estimator of a given classifier without retraining or modifying it, thus being practically appealing.We perform an extensive experimental study of many existing and proposed confidence estimators applied to 84 pre-trained ImageNet classifiers available from popular repositories. Our results show that a simple $p$-norm normalization of the logits, followed by taking the maximum logit as the confidence estimator, can lead to considerable gains in selective classification performance, completely fixing the pathological behavior observed in many classifiers. As a consequence, the selective classification performance of any classifier becomes almost entirely determined by its corresponding accuracy. Then, we show these results are consistent under distribution shift: a method that enhances performance in the in-distribution scenario also provides similar gains under distribution shift. Moreover, although a slight degradation in selective classification performance is observed under distribution shift, this can be explained by the drop in accuracy of the classifier, together with the slight dependence of selective classification performance on accuracy.

Chat is not available.