NeurIPS Poster Batchnorm Allows Unsupervised Radial Attacks

Poster

Batchnorm Allows Unsupervised Radial Attacks

Amur Ghose · Apurv Gupta · Yaoliang Yu · Pascal Poupart

Great Hall & Hall B1+B2 (level 1) #804

[ Abstract ]

[ Paper] [ Poster] [ OpenReview]

Abstract:

The construction of adversarial examples usually requires the existence of soft or hard labels for each instance, with respect to which a loss gradient provides the signal for construction of the example. We show that for batch normalized deep image recognition architectures, intermediate latents that are produced after a batch normalization step by themselves suffice to produce adversarial examples using an intermediate loss solely utilizing angular deviations, without relying on any label. We motivate our loss through the geometry of batch normed representations and their concentration of norm on a hypersphere and distributional proximity to Gaussians. Our losses expand intermediate latent based attacks that usually require labels. The success of our method implies that leakage of intermediate representations may create a security breach for deployed models, which persists even when the model is transferred to downstream usage. Removal of batch norm weakens our attack, indicating it contributes to this vulnerability. Our attacks also succeed against LayerNorm empirically, thus being relevant for transformer architectures, most notably vision transformers which we analyze.

Chat is not available.