Timezone: »

When Does Re-initialization Work?
Sheheryar Zaidi · Tudor Berariu · Hyunjik Kim · Jorg Bornschein · Claudia Clopath · Yee Whye Teh · Razvan Pascanu
Event URL: https://openreview.net/forum?id=KmoOwyzCX_ »

Re-initializing a neural network during training has been observed to improve generalization in recent works. Yet it is neither widely adopted in deep learning practice nor is it often used in state-of-the-art training protocols. This raises the question of when re-initialization works, and whether it should be used together with regularization techniques such as data augmentation, weight decay and learning rate schedules. In this work, we conduct an extensive empirical comparison of standard training with a selection of re-initialization methods to answer this question, training over 15,000 models on a variety of image classification benchmarks. We first establish that such methods are consistently beneficial for generalization in the absence of any other regularization. However, when deployed alongside other carefully tuned regularization techniques, re-initialization methods offer little to no added benefit for generalization, although optimal generalization performance becomes less sensitive to the choice of learning rate and weight decay hyperparameters. To investigate the impact of re-initialization methods on noisy data, we also consider learning under label noise. Surprisingly, in this case, re-initialization significantly improves upon standard training, even in the presence of other carefully tuned regularization techniques.

Author Information

Sheheryar Zaidi (University of Oxford)
Tudor Berariu (Imperial College London)
Hyunjik Kim (DeepMind)
Jorg Bornschein (DeepMind)
Claudia Clopath (Imperial College London)
Yee Whye Teh (University of Oxford, DeepMind)

I am a Professor of Statistical Machine Learning at the Department of Statistics, University of Oxford and a Research Scientist at DeepMind. I am also an Alan Turing Institute Fellow and a European Research Council Consolidator Fellow. I obtained my Ph.D. at the University of Toronto (working with Geoffrey Hinton), and did postdoctoral work at the University of California at Berkeley (with Michael Jordan) and National University of Singapore (as Lee Kuan Yew Postdoctoral Fellow). I was a Lecturer then a Reader at the Gatsby Computational Neuroscience Unit, UCL, and a tutorial fellow at University College Oxford, prior to my current appointment. I am interested in the statistical and computational foundations of intelligence, and works on scalable machine learning, probabilistic models, Bayesian nonparametrics and deep learning. I was programme co-chair of ICML 2017 and AISTATS 2010.

Razvan Pascanu (Google DeepMind)

More from the Same Authors