Timezone: »
One desired capability for machines is the ability to transfer their understanding of one domain to another domain where data is (usually) scarce. Despite ample adaptation of transfer learning in many deep learning applications, we yet do not understand what enables a successful transfer and which part of the network is responsible for that. In this paper, we provide new tools and analysis to address these fundamental questions. Through a series of analysis on transferring to block-shuffled images, we separate the effect of feature reuse from learning high-level statistics of data and show that some benefit of transfer learning comes from the latter. We present that when training from pre-trained weights, the model stays in the same basin in the loss landscape and different instances of such model are similar in feature space and close in parameter space.
Author Information
Behnam Neyshabur (Google)
I am a staff research scientist at Google. Before that, I was a postdoctoral researcher at New York University and a member of Theoretical Machine Learning program at Institute for Advanced Study (IAS) in Princeton. In summer 2017, I received a PhD in computer science at TTI-Chicago where I was fortunate to be advised by Nati Srebro.
Hanie Sedghi (Google Brain)

I am a senior research scientist at Google Brain, where I lead the “Deep Phenomena” team. My approach is to bond theory and practice in large-scale machine learning by designing algorithms with theoretical guarantees that also work efficiently in practice. Over the recent years, I have been working on understanding and improving deep learning. Prior to Google, I was a Research Scientist at Allen Institute for Artificial Intelligence and before that, a postdoctoral fellow at UC Irvine. I received my PhD from University of Southern California with a minor in mathematics in 2015.
Chiyuan Zhang (Google Brain)
More from the Same Authors
-
2021 : Leveraging Unlabeled Data to Predict Out-of-Distribution Performance »
Saurabh Garg · Sivaraman Balakrishnan · Zachary Lipton · Behnam Neyshabur · Hanie Sedghi -
2021 : Avoiding Spurious Correlations: Bridging Theory and Practice »
Thao Nguyen · Hanie Sedghi · Behnam Neyshabur -
2021 : Understanding and Improving Robustness of VisionTransformers through patch-based NegativeAugmentation »
Yao Qin · Chiyuan Zhang · Ting Chen · Balaji Lakshminarayanan · Alex Beutel · Xuezhi Wang -
2022 : Teaching Algorithmic Reasoning via In-context Learning »
Hattie Zhou · Azade Nova · aaron courville · Hugo Larochelle · Behnam Neyshabur · Hanie Sedghi -
2022 : MATH-AI: Toward Human-Level Mathematical Reasoning »
Francois Charton · Noah Goodman · Behnam Neyshabur · Talia Ringer · Daniel Selsam -
2022 : Teaching Algorithmic Reasoning via In-context Learning »
Hattie Zhou · Azade Nova · aaron courville · Hugo Larochelle · Behnam Neyshabur · Hanie Sedghi -
2022 : Panel Discussion »
Behnam Neyshabur · David Sontag · Pradeep Ravikumar · Erin Hartman -
2022 : Length Generalization in Quantitative Reasoning »
Behnam Neyshabur -
2022 Poster: Understanding and Improving Robustness of Vision Transformers through Patch-based Negative Augmentation »
Yao Qin · Chiyuan Zhang · Ting Chen · Balaji Lakshminarayanan · Alex Beutel · Xuezhi Wang -
2022 Poster: Exploring Length Generalization in Large Language Models »
Cem Anil · Yuhuai Wu · Anders Andreassen · Aitor Lewkowycz · Vedant Misra · Vinay Ramasesh · Ambrose Slone · Guy Gur-Ari · Ethan Dyer · Behnam Neyshabur -
2022 Poster: Revisiting Neural Scaling Laws in Language and Vision »
Ibrahim Alabdulmohsin · Behnam Neyshabur · Xiaohua Zhai -
2022 Poster: The Privacy Onion Effect: Memorization is Relative »
Nicholas Carlini · Matthew Jagielski · Chiyuan Zhang · Nicolas Papernot · Andreas Terzis · Florian Tramer -
2022 Poster: Solving Quantitative Reasoning Problems with Language Models »
Aitor Lewkowycz · Anders Andreassen · David Dohan · Ethan Dyer · Henryk Michalewski · Vinay Ramasesh · Ambrose Slone · Cem Anil · Imanol Schlag · Theo Gutman-Solo · Yuhuai Wu · Behnam Neyshabur · Guy Gur-Ari · Vedant Misra -
2022 Poster: Learning to Reason with Neural Networks: Generalization, Unseen Data and Boolean Measures »
Emmanuel Abbe · Samy Bengio · Elisabetta Cornacchia · Jon Kleinberg · Aryo Lotfi · Maithra Raghu · Chiyuan Zhang -
2022 Poster: Block-Recurrent Transformers »
DeLesley Hutchins · Imanol Schlag · Yuhuai Wu · Ethan Dyer · Behnam Neyshabur -
2021 Poster: Deep Learning with Label Differential Privacy »
Badih Ghazi · Noah Golowich · Ravi Kumar · Pasin Manurangsi · Chiyuan Zhang -
2021 Poster: Deep Learning Through the Lens of Example Difficulty »
Robert Baldock · Hartmut Maennel · Behnam Neyshabur -
2021 Poster: Do Vision Transformers See Like Convolutional Neural Networks? »
Maithra Raghu · Thomas Unterthiner · Simon Kornblith · Chiyuan Zhang · Alexey Dosovitskiy -
2020 Poster: What Neural Networks Memorize and Why: Discovering the Long Tail via Influence Estimation »
Vitaly Feldman · Chiyuan Zhang -
2020 Spotlight: What Neural Networks Memorize and Why: Discovering the Long Tail via Influence Estimation »
Vitaly Feldman · Chiyuan Zhang -
2020 Poster: Towards Learning Convolutions from Scratch »
Behnam Neyshabur -
2019 : Lunch Break and Posters »
Xingyou Song · Elad Hoffer · Wei-Cheng Chang · Jeremy Cohen · Jyoti Islam · Yaniv Blumenfeld · Andreas Madsen · Jonathan Frankle · Sebastian Goldt · Satrajit Chatterjee · Abhishek Panigrahi · Alex Renda · Brian Bartoldson · Israel Birhane · Aristide Baratin · Niladri Chatterji · Roman Novak · Jessica Forde · YiDing Jiang · Yilun Du · Linara Adilova · Michael Kamp · Berry Weinstein · Itay Hubara · Tal Ben-Nun · Torsten Hoefler · Daniel Soudry · Hsiang-Fu Yu · Kai Zhong · Yiming Yang · Inderjit Dhillon · Jaime Carbonell · Yanqing Zhang · Dar Gilboa · Johannes Brandstetter · Alexander R Johansen · Gintare Karolina Dziugaite · Raghav Somani · Ari Morcos · Freddie Kalaitzis · Hanie Sedghi · Lechao Xiao · John Zech · Muqiao Yang · Simran Kaur · Qianli Ma · Yao-Hung Hubert Tsai · Ruslan Salakhutdinov · Sho Yaida · Zachary Lipton · Daniel Roy · Michael Carbin · Florent Krzakala · Lenka Zdeborová · Guy Gur-Ari · Ethan Dyer · Dilip Krishnan · Hossein Mobahi · Samy Bengio · Behnam Neyshabur · Praneeth Netrapalli · Kris Sankaran · Julien Cornebise · Yoshua Bengio · Vincent Michalski · Samira Ebrahimi Kahou · Md Rifat Arefin · Jiri Hron · Jaehoon Lee · Jascha Sohl-Dickstein · Samuel Schoenholz · David Schwab · Dongyu Li · Sang Keun Choe · Henning Petzka · Ashish Verma · Zhichao Lin · Cristian Sminchisescu -
2019 Poster: Transfusion: Understanding Transfer Learning for Medical Imaging »
Maithra Raghu · Chiyuan Zhang · Jon Kleinberg · Samy Bengio -
2017 : Contributed talk 1 - A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks »
Behnam Neyshabur -
2017 Poster: Exploring Generalization in Deep Learning »
Behnam Neyshabur · Srinadh Bhojanapalli · David Mcallester · Nati Srebro -
2017 Poster: Implicit Regularization in Matrix Factorization »
Suriya Gunasekar · Blake Woodworth · Srinadh Bhojanapalli · Behnam Neyshabur · Nati Srebro -
2017 Spotlight: Implicit Regularization in Matrix Factorization »
Suriya Gunasekar · Blake Woodworth · Srinadh Bhojanapalli · Behnam Neyshabur · Nati Srebro