Spotlight
in
Workshop: UniReps: Unifying Representations in Neural Models

Grokking as Simplification: A Nonlinear Complexity Perspective

Ziming Liu · Ziqian Zhong · Max Tegmark

Project Page [ OpenReview]

Abstract

We attribute grokking, the phenomenon where generalization is much delayed after memorization, to compression. We define linear mapping number (LMN) to measure network complexity, which is a generalized version of linear region number for ReLU networks. LMN can nicely characterize neural network compression before generalization. Although $L_2$ norm has been popular to characterize model complexity, we argue in favor of LMN for a number of reasons: (1) LMN can be naturally interpreted as information/computation, while $L_2$ cannot. (2) In the compression phase, LMN has nice linear relations with test losses, while $L_2$ is correlated with test losses in a complicated nonlinear way. (3) LMN also reveals an intriguing phenomenon of the XOR network switching between two generalization solutions, while $L_2$ does not. Besides explaning grokking, we argue that LMN is a promising candidate as the neural network version of the Kolmogorov complexity, since it explicitly considers local or conditioned linear computations aligned with the nature of modern artificial neural networks.

Video

Chat is not available.