Timezone: »
The practical success of overparameterized neural networks has motivated the recent scientific study of \emph{interpolating methods}-- learning methods which are able fit their training data perfectly. Empirically, certain interpolating methods can fit noisy training data without catastrophically bad test performance, which defies standard intuitions from statistical learning theory. Aiming to explain this, a large body of recent work has studied \emph{benign overfitting}, a behavior seen in certain asymptotic settings under which interpolating methods approach Bayes-optimality, even in the presence of noise. In this work, we argue that, while benign overfitting has been instructive to study, real interpolating methods like deep networks do not fit benignly. That is, noise in the train set leads to suboptimal generalization, suggesting that these methods fall in an intermediate regime between benign and catastrophic overfitting, in which asymptotic risk is neither is neither Bayes-optimal nor unbounded, with the confounding effect of the noise being ``tempered" but non-negligible. We call this behavior \textit{tempered overfitting}. We first provide broad empirical evidence for our three-part taxonomy, demonstrating that deep neural networks and kernel machines fit to noisy data can be reasonably well classified as benign, tempered, or catastrophic. We then specialize to kernel (ridge) regression (KR), obtaining conditions on the ridge parameter and kernel eigenspectrum under which KR exhibits each of the three behaviors, demonstrating the consequences for KR with common kernels and trained neural networks of infinite width using experiments on natural and synthetic datasets.
Author Information
Neil Mallinar (UC San Diego)
James Simon (University of California Berkeley)
Amirhesam Abedsoltan (University of California, San Diego)
Parthe Pandit (University of California, San Diego)
Misha Belkin (Ohio State University)
Preetum Nakkiran (Apple)
More from the Same Authors
-
2022 Poster: Instability and Local Minima in GAN Training with Kernel Discriminators »
Evan Becker · Parthe Pandit · Sundeep Rangan · Alyson Fletcher -
2022 Poster: Transition to Linearity of General Neural Networks with Directed Acyclic Graph Architecture »
Libin Zhu · Chaoyue Liu · Misha Belkin -
2022 Poster: Benign Overfitting in Two-layer Convolutional Neural Networks »
Yuan Cao · Zixiang Chen · Misha Belkin · Quanquan Gu -
2022 Poster: Avalon: A Benchmark for RL Generalization Using Procedurally Generated Worlds »
Joshua Albrecht · Abraham Fetterman · Bryden Fogelman · Ellie Kitanidis · Bartosz Wróblewski · Nicole Seo · Michael Rosenthal · Maksis Knutins · Zack Polizzi · James Simon · Kanjun Qiu -
2022 Poster: What You See is What You Get: Principled Deep Learning via Distributional Generalization »
Bogdan Kulynych · Yao-Yuan Yang · Yaodong Yu · Jarosław Błasiok · Preetum Nakkiran -
2022 Poster: Knowledge Distillation: Bad Models Can Be Good Role Models »
Gal Kaplun · Eran Malach · Preetum Nakkiran · Shai Shalev-Shwartz -
2020 Poster: Matrix Inference and Estimation in Multi-Layer Models »
Parthe Pandit · Mojtaba Sahraee Ardakan · Sundeep Rangan · Philip Schniter · Alyson Fletcher -
2018 Poster: Plug-in Estimation in High-Dimensional Linear Inverse Problems: A Rigorous Analysis »
Alyson Fletcher · Parthe Pandit · Sundeep Rangan · Subrata Sarkar · Philip Schniter