Timezone: »

Broken Neural Scaling Laws
Ethan Caballero · Kshitij Gupta · Irina Rish · David Krueger

We present a smoothly broken power law functional form that accurately models the scaling behaviors of deep neural networks (i.e. how the evaluation metric of interest varies as the amount of compute used for training, number of model parameters, or training dataset size varies) for each task within a large and diverse set of upstream and downstream tasks, in zero-shot, prompted, and fine-tuned settings. This set includes large-scale vision and unsupervised language tasks, arithmetic, and reinforcement learning. This functional form yields extrapolations of scaling behavior that often are an order of magnitude more accurate than the ones obtained by other functional forms for neural scaling behavior. Moreover, this functional form accurately models the non-monotonic transitions present in the scaling behavior of phenomena such as double descent and the delayed, sharp inflection points present in the scaling behavior of tasks such as arithmetic. Lastly, we use this functional form to glean insights about the limit of the predictability of scaling behavior.

Author Information

Ethan Caballero (Mila)


Kshitij Gupta (Université de Montréal)
Irina Rish (Mila/UdeM)
David Krueger (Mila, University of Montreal)

More from the Same Authors