Timezone: »
Deep over-parameterized neural networks exhibit the interpolation property on many data sets. That is, these models are able to achieve approximately zero loss on all training samples simultaneously. Recently, this property has been exploited to develop novel optimisation algorithms for this setting. These algorithms use the fact that the optimal loss value is known to employ a variation of a Polyak Step-size calculated on a stochastic batch of data. In this work, we introduce an algorithm that extends this idea to tasks where the interpolation property does not hold. As we no longer have access to the optimal loss values a priori, we instead estimate them for each sample online. To realise this, we introduce a simple but highly effective heuristic for approximating the optimal value based on previous loss evaluations. Through rigorous experimentation we show the effectiveness of our approach, which outperforms adaptive gradient and line search methods.
Author Information
Alasdair Paren (University of Oxford)
Rudra Poudel (Toshiba Research)
Pawan K Mudigonda (University of Oxford)
More from the Same Authors
-
2022 Poster: In Defense of the Unitary Scalarization for Deep Multi-Task Learning »
Vitaly Kurin · Alessandro De Palma · Ilya Kostrikov · Shimon Whiteson · Pawan K Mudigonda -
2021 : Poster Session 1 (gather.town) »
Hamed Jalali · Robert Hönig · Maximus Mutschler · Manuel Madeira · Abdurakhmon Sadiev · Egor Shulgin · Alasdair Paren · Pascal Esser · Simon Roburin · Julius Kunze · Agnieszka Słowik · Frederik Benzing · Futong Liu · Hongyi Li · Ryotaro Mitsuboshi · Grigory Malinovsky · Jayadev Naram · Zhize Li · Igor Sokolov · Sharan Vaswani -
2020 Poster: Hybrid Models for Learning to Branch »
Prateek Gupta · Maxime Gasse · Elias Khalil · Pawan K Mudigonda · Andrea Lodi · Yoshua Bengio -
2018 Poster: A Unified View of Piecewise Linear Neural Network Verification »
Rudy Bunel · Ilker Turkaslan · Philip Torr · Pushmeet Kohli · Pawan K Mudigonda -
2016 Poster: Adaptive Neural Compilation »
Rudy Bunel · Alban Desmaison · Pawan K Mudigonda · Pushmeet Kohli · Philip Torr -
2016 Poster: DISCO Nets : DISsimilarity COefficients Networks »
Diane Bouchacourt · Pawan K Mudigonda · Sebastian Nowozin -
2008 Poster: Improved Moves for Truncated Convex Models »
Pawan K Mudigonda · Philip Torr -
2008 Spotlight: Improved Moves for Truncated Convex Models »
Pawan K Mudigonda · Philip Torr -
2007 Oral: An Analysis of Convex Relaxations for MAP Estimation »
Pawan K Mudigonda · Vladimir Kolmogorov · Philip Torr -
2007 Poster: An Analysis of Convex Relaxations for MAP Estimation »
Pawan K Mudigonda · Vladimir Kolmogorov · Philip Torr