Skip to yearly menu bar Skip to main content


Poster

Normalization Layer Per-Example Gradients are Sufficient to Predict Gradient Noise Scale in Transformers

Gavia Gray · aman tiwari · Shane Bergsma · Joel Hestness
2024 Poster

Abstract

Video

Chat is not available.