Skip to yearly menu bar Skip to main content

Workshop: Trustworthy and Socially Responsible Machine Learning

Denoised Smoothing with Sample Rejection for Robustifying Pretrained Classifiers

Fatemeh Sheikholeslami · Wan-Yi Lin · Jan Hendrik Metzen · Huan Zhang · J. Zico Kolter

Abstract: Denoised smoothing is the state-of-the-art approach to defending pretrained classifiers against $\ell_p$ adversarial attacks, where a denoiser is prepended to the pretrained classifier, and the joint system is adversarially verified via randomized smoothing. Despite its state-of-the-art certified robustness against $\ell_2$-norm adversarial inputs, the pretrained base classifier is often quite uncertain when making its predictions on the denoised examples, which leads to lower natural accuracy. In this work, we show that by augmenting the joint system with a ``rejector'' and exploiting adaptive sample rejection, (i.e., intentionally abstain from providing a prediction), we can achieve substantially improved accuracy (especially natural accuracy) over denoised smoothing alone. That is, we show how the joint classifier-rejector can be viewed as a classification-with-rejection per sample, while the smoothed joint system can be turned into a robust \emph{smoothed classifier without rejection}, against $\ell_2$-norm perturbations while retaining certifiability. Tests on CIFAR10 dataset show considerable improvements in \emph{natural} accuracy without degrading adversarial performance, with affordably-trainable rejectors, specially for medium and large values of noise parameter $\sigma$.

Chat is not available.