Timezone: »
Learning rate schedule can significantly affect generalization performance in modern neural networks, but the reasons for this are not yet understood. Li et al. (2019) recently proved this behavior can exist in a simplified non-convex neural-network setting. In this work, we show that this phenomenon can exist even for convex learning problems -- in particular, linear regression in 2 dimensions. We give a toy convex problem where learning rate annealing (large initial learning rate, followed by small learning rate) can lead gradient descent to minima with provably better generalization than using a small learning rate throughout. In our case, this occurs due to a combination of the mismatch between the test and train loss landscapes, and early-stopping.
Author Information
Preetum Nakkiran (Harvard)
More from the Same Authors
-
2022 : APE: Aligning Pretrained Encoders to Quickly Learn Aligned Multimodal Representations »
Elan Rosenfeld · Preetum Nakkiran · Hadi Pouransari · Oncel Tuzel · Fartash Faghri -
2022 : Deconstructing Distributions: A Pointwise Framework of Learning »
Gal Kaplun · Nikhil Ghosh · Saurabh Garg · Boaz Barak · Preetum Nakkiran -
2021 Poster: Revisiting Model Stitching to Compare Neural Representations »
Yamini Bansal · Preetum Nakkiran · Boaz Barak -
2020 : Contributed talks in Session 3 (Zoom) »
Mark Schmidt · Zhan Gao · Wenjie Li · Preetum Nakkiran · Denny Wu · Chengrun Yang -
2020 : Poster Session 2 (gather.town) »
Sharan Vaswani · Nicolas Loizou · Wenjie Li · Preetum Nakkiran · Zhan Gao · Sina Baghal · Jingfeng Wu · Roozbeh Yousefzadeh · Jinyi Wang · Jing Wang · Cong Xie · Anastasia Borovykh · Stanislaw Jastrzebski · Soham Dan · Yiliang Zhang · Mark Tuddenham · Sarath Pattathil · Ievgen Redko · Jeremy Cohen · Yasaman Esfandiari · Zhanhong Jiang · Mostafa ElAraby · Chulhee Yun · Michael Psenka · Robert Gower · Xiaoyu Wang -
2019 Poster: SGD on Neural Networks Learns Functions of Increasing Complexity »
Dimitris Kalimeris · Gal Kaplun · Preetum Nakkiran · Benjamin Edelman · Tristan Yang · Boaz Barak · Haofeng Zhang -
2019 Spotlight: SGD on Neural Networks Learns Functions of Increasing Complexity »
Dimitris Kalimeris · Gal Kaplun · Preetum Nakkiran · Benjamin Edelman · Tristan Yang · Boaz Barak · Haofeng Zhang