Understanding Scaling Laws via Neural Feature Learning Dynamics
Abstract
Recently, deep neural networks have revolutionized various domains, primarily due to their ability to consistently improve performance when scaling up resources, including model size, data, and compute, a phenomenon formalized as scaling laws. Yet, the theoretical basis of these principles remains unclear: why scaling works and when it breaks down. We address this gap by analyzing the feature learning dynamics of ResNets trained with SGD. In the joint infinite-width–depth limit, we show that feature evolution is governed by a coupled forward–backward stochastic system, which we term the \textit{neural feature learning dynamic system}. This framework clarifies the mechanisms underlying scaling laws and offers a new mathematical tool for studying deep learning dynamics.