Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Has it Trained Yet? A Workshop for Algorithmic Efficiency in Practical Neural Network Training

Perturbing BatchNorm and Only BatchNorm Benefits Sharpness-Aware Minimization

Maximilian Mueller · Matthias Hein


Abstract:

We investigate the connection between two popular methods commonly used in training deep neural networks: Sharpness-Aware Minimization (SAM) and Batch Normalization. We find that perturbing \textit{only} the affine BatchNorm parameters in the adversarial step of SAM benefits the generalization performance, while excluding them can decrease the performance strongly. We confirm our results across several models and SAM-variants on CIFAR-10 and CIFAR-100 and show preliminary results for ImageNet. Our results provide a practical tweak for training deep networks, but also cast doubt on the commonly accepted explanation of SAM minimizing a sharpness quantity responsible for generalization.

Chat is not available.