Timezone: »
We take a Bayesian perspective to illustrate a connection between training speed and the marginal likelihood in linear models. This provides two major insights: first, that a measure of a model's training speed can be used to estimate its marginal likelihood. Second, that this measure, under certain conditions, predicts the relative weighting of models in linear model combinations trained to minimize a regression loss. We verify our results in model selection tasks for linear models and for the infinite-width limit of deep neural networks. We further provide encouraging empirical evidence that the intuition developed in these settings also holds for deep neural networks trained with stochastic gradient descent. Our results suggest a promising new direction towards explaining why neural networks trained with stochastic gradient descent are biased towards functions that generalize well.
Author Information
Clare Lyle (University of Oxford)
Lisa Schut (University of Oxford)
Robin Ru (Oxford University)
Yarin Gal (University of Oxford)
Mark van der Wilk (Imperial College)
More from the Same Authors
-
2020 : Paper 40: Real2sim: Automatic Generation of Open Street Map Towns For Autonomous Driving Benchmarks »
Panagiotis Tigas · Yarin Gal -
2020 Meetup: MeetUp: Oxford, UK »
Yarin Gal -
2021 Spotlight: Speedy Performance Estimation for Neural Architecture Search »
Robin Ru · Clare Lyle · Lisa Schut · Miroslav Fil · Mark van der Wilk · Yarin Gal -
2021 : Shifts: A Dataset of Real Distributional Shift Across Multiple Large-Scale Tasks »
Andrey Malinin · Neil Band · Yarin Gal · Mark Gales · Alexander Ganshin · German Chesnokov · Alexey Noskov · Andrey Ploskonosov · Liudmila Prokhorenkova · Ivan Provilkov · Vatsal Raina · Vyas Raina · Denis Roginskiy · Mariya Shmatova · Panagiotis Tigas · Boris Yangel -
2021 : Benchmarking Bayesian Deep Learning on Diabetic Retinopathy Detection Tasks »
Neil Band · Tim G. J. Rudner · Qixuan Feng · Angelos Filos · Zachary Nado · Mike Dusenberry · Ghassen Jerfel · Dustin Tran · Yarin Gal -
2021 : DeDUCE: Generating Counterfactual Explanations At Scale »
Benedikt Höltgen · Lisa Schut · Jan Brauner · Yarin Gal -
2021 : Benchmarking Bayesian Deep Learning on Diabetic Retinopathy Detection Tasks »
Neil Band · Tim G. J. Rudner · Qixuan Feng · Angelos Filos · Zachary Nado · Mike Dusenberry · Ghassen Jerfel · Dustin Tran · Yarin Gal -
2021 : Understanding and Preventing Capacity Loss in Reinforcement Learning »
Clare Lyle · Mark Rowland · Will Dabney -
2021 : Using Non-Linear Causal Models to Study Aerosol-Cloud Interactions in the Southeast Pacific »
Andrew Jesson · Peter Manshausen · Alyson Douglas · Duncan Watson-Parris · Yarin Gal · Philip Stier -
2021 : DARTS without a Validation Set: Optimizing the Marginal Likelihood »
Miroslav Fil · Robin Ru · Clare Lyle · Yarin Gal -
2021 : Using Non-Linear Causal Models to StudyAerosol-Cloud Interactions in the Southeast Pacific »
Andrew Jesson · Peter Manshausen · Alyson Douglas · Duncan Watson-Parris · Yarin Gal · Philip Stier -
2021 : Can Network Flatness Explain the Training Speed-Generalisation Connection? »
Albert Q. Jiang · Clare Lyle · Lisa Schut · Yarin Gal -
2021 : Decomposing Representations for Deterministic Uncertainty Estimation »
Haiwen Huang · Joost van Amersfoort · Yarin Gal -
2021 : On Feature Collapse and Deep Kernel Learning for Single Forward Pass Uncertainty »
Joost van Amersfoort · Lewis Smith · Andrew Jesson · Oscar Key · Yarin Gal -
2021 : Contrastive Representation Learning with Trainable Augmentation Channel »
Masanori Koyama · Kentaro Minami · Takeru Miyato · Yarin Gal -
2021 : Uncertainty Baselines: Benchmarks for Uncertainty & Robustness in Deep Learning »
Zachary Nado · Neil Band · Mark Collier · Josip Djolonga · Mike Dusenberry · Sebastian Farquhar · Qixuan Feng · Angelos Filos · Marton Havasi · Rodolphe Jenatton · Ghassen Jerfel · Jeremiah Liu · Zelda Mariet · Jeremy Nixon · Shreyas Padhy · Jie Ren · Tim G. J. Rudner · Yeming Wen · Florian Wenzel · Kevin Murphy · D. Sculley · Balaji Lakshminarayanan · Jasper Snoek · Yarin Gal · Dustin Tran -
2021 : Benchmarking Bayesian Deep Learning on Diabetic Retinopathy Detection Tasks »
Neil Band · Tim G. J. Rudner · Qixuan Feng · Angelos Filos · Zachary Nado · Mike Dusenberry · Ghassen Jerfel · Dustin Tran · Yarin Gal -
2022 : Actually Sparse Variational Gaussian Processes »
Jake Cunningham · So Takao · Mark van der Wilk · Marc Deisenroth -
2022 : Recommendations for Baselines and Benchmarking Approximate Gaussian Processes »
Sebastian Ober · David Burt · Artem Artemev · Mark van der Wilk -
2022 : Discovering Long-period Exoplanets using Deep Learning with Citizen Science Labels »
Shreshth A Malik · Nora Eisner · Chris Lintott · Yarin Gal -
2022 : Using uncertainty-aware machine learning models to study aerosol-cloud interactions »
Maëlys Solal · Andrew Jesson · Yarin Gal · Alyson Douglas -
2022 : TranceptEVE: Combining Family-specific and Family-agnostic Models of Protein Sequences for Improved Fitness Prediction »
Pascal Notin · Lodevicus van Niekerk · Aaron Kollasch · Daniel Ritter · Yarin Gal · Debora Marks -
2022 : Can Active Sampling Reduce Causal Confusion in Offline Reinforcement Learning? »
Gunshi Gupta · Tim G. J. Rudner · Rowan McAllister · Adrien Gaidon · Yarin Gal -
2022 : Sparse Convolutions on Lie Groups »
Tycho van der Ouderaa · Mark van der Wilk -
2022 : Can Active Sampling Reduce Causal Confusion in Offline Reinforcement Learning? »
Gunshi Gupta · Tim G. J. Rudner · Rowan McAllister · Adrien Gaidon · Yarin Gal -
2022 : Causal Discovery using Marginal Likelihood »
Anish Dhir · Mark van der Wilk -
2022 : Towards Discovering Neural Architectures from Scratch »
Simon Schrodi · Danny Stoll · Robin Ru · Rhea Sukthanker · Thomas Brox · Frank Hutter -
2022 : What 'Out-of-distribution' Is and Is Not »
Sebastian Farquhar · Yarin Gal -
2022 : Semantic Uncertainty: Linguistic Invariances for Uncertainty Estimation in Natural Language Generation »
Lorenz Kuhn · Yarin Gal · Sebastian Farquhar -
2022 : Can Active Sampling Reduce Causal Confusion in Offline Reinforcement Learning? »
Gunshi Gupta · Tim G. J. Rudner · Rowan McAllister · Adrien Gaidon · Yarin Gal -
2022 Poster: Tractable Function-Space Variational Inference in Bayesian Neural Networks »
Tim G. J. Rudner · Zonghao Chen · Yee Whye Teh · Yarin Gal -
2022 Poster: Scalable Sensitivity and Uncertainty Analyses for Causal-Effect Estimates of Continuous-Valued Interventions »
Andrew Jesson · Alyson Douglas · Peter Manshausen · Maëlys Solal · Nicolai Meinshausen · Philip Stier · Yarin Gal · Uri Shalit -
2022 Poster: Interventions, Where and How? Experimental Design for Causal Models at Scale »
Panagiotis Tigas · Yashas Annadani · Andrew Jesson · Bernhard Schölkopf · Yarin Gal · Stefan Bauer -
2022 Poster: Invariance Learning in Deep Neural Networks with Differentiable Laplace Approximations »
Alexander Immer · Tycho van der Ouderaa · Gunnar Rätsch · Vincent Fortuin · Mark van der Wilk -
2022 Poster: SnAKe: Bayesian Optimization with Pathwise Exploration »
Jose Pablo Folch · Shiqiang Zhang · Robert Lee · Behrang Shafei · David Walz · Calvin Tsay · Mark van der Wilk · Ruth Misener -
2022 Poster: Memory safe computations with XLA compiler »
Artem Artemev · Yuze An · Tilman Roeder · Mark van der Wilk -
2022 Poster: Relaxing Equivariance Constraints with Non-stationary Continuous Filters »
Tycho van der Ouderaa · David W. Romero · Mark van der Wilk -
2022 Poster: Active Surrogate Estimators: An Active Learning Approach to Label-Efficient Model Evaluation »
Jannik Kossen · Sebastian Farquhar · Yarin Gal · Thomas Rainforth -
2021 : Human-in-the-loop Bayesian Deep Learning »
Yarin Gal -
2021 : [S7] DeDUCE: Generating Counterfactual Explanations At Scale »
Benedikt Höltgen · Lisa Schut · Jan Brauner · Yarin Gal -
2021 Workshop: Bayesian Deep Learning »
Yarin Gal · Yingzhen Li · Sebastian Farquhar · Christos Louizos · Eric Nalisnick · Andrew Gordon Wilson · Zoubin Ghahramani · Kevin Murphy · Max Welling -
2021 : Benchmarking Bayesian Deep Learning on Diabetic Retinopathy Detection Tasks »
Neil Band · Tim G. J. Rudner · Qixuan Feng · Angelos Filos · Zachary Nado · Mike Dusenberry · Ghassen Jerfel · Dustin Tran · Yarin Gal -
2021 Poster: Speedy Performance Estimation for Neural Architecture Search »
Robin Ru · Clare Lyle · Lisa Schut · Miroslav Fil · Mark van der Wilk · Yarin Gal -
2021 Poster: How Powerful are Performance Predictors in Neural Architecture Search? »
Colin White · Arber Zela · Robin Ru · Yang Liu · Frank Hutter -
2021 : Evaluating Approximate Inference in Bayesian Deep Learning + Q&A »
Andrew Gordon Wilson · Pavel Izmailov · Matthew Hoffman · Yarin Gal · Yingzhen Li · Melanie F. Pradier · Sharad Vikram · Andrew Foong · Sanae Lotfi · Sebastian Farquhar -
2021 Poster: Outcome-Driven Reinforcement Learning via Variational Inference »
Tim G. J. Rudner · Vitchyr Pong · Rowan McAllister · Yarin Gal · Sergey Levine -
2021 Poster: Improving black-box optimization in VAE latent space using decoder uncertainty »
Pascal Notin · José Miguel Hernández-Lobato · Yarin Gal -
2021 Poster: On Pathologies in KL-Regularized Reinforcement Learning from Expert Demonstrations »
Tim G. J. Rudner · Cong Lu · Michael A Osborne · Yarin Gal · Yee Teh -
2021 : Shifts Challenge: Robustness and Uncertainty under Real-World Distributional Shift + Q&A »
Andrey Malinin · Neil Band · German Chesnokov · Yarin Gal · Alexander Ganshin · Mark Gales · Alexey Noskov · Liudmila Prokhorenkova · Mariya Shmatova · Vyas Raina · Vatsal Raina · Panagiotis Tigas · Boris Yangel -
2021 Poster: Adversarial Attacks on Graph Classifiers via Bayesian Optimisation »
Xingchen Wan · Henry Kenlay · Robin Ru · Arno Blaas · Michael A Osborne · Xiaowen Dong -
2021 Poster: Causal-BALD: Deep Bayesian Active Learning of Outcomes to Infer Treatment-Effects from Observational Data »
Andrew Jesson · Panagiotis Tigas · Joost van Amersfoort · Andreas Kirsch · Uri Shalit · Yarin Gal -
2021 Poster: Domain Invariant Representation Learning with Domain Density Transformations »
A. Tuan Nguyen · Toan Tran · Yarin Gal · Atilim Gunes Baydin -
2021 Poster: Self-Attention Between Datapoints: Going Beyond Individual Input-Output Pairs in Deep Learning »
Jannik Kossen · Neil Band · Clare Lyle · Aidan Gomez · Thomas Rainforth · Yarin Gal -
2020 Poster: Liberty or Depth: Deep Bayesian Neural Nets Do Not Need Complex Weight Posterior Approximations »
Sebastian Farquhar · Lewis Smith · Yarin Gal -
2020 Poster: Neural Architecture Generator Optimization »
Robin Ru · Pedro Esperança · Fabio Maria Carlucci -
2020 Poster: Identifying Causal-Effect Inference Failure with Uncertainty-Aware Models »
Andrew Jesson · Sören Mindermann · Uri Shalit · Yarin Gal -
2020 Poster: How Robust are the Estimated Effects of Nonpharmaceutical Interventions against COVID-19? »
Mrinank Sharma · Sören Mindermann · Jan Brauner · Gavin Leech · Anna Stephenson · Tomáš Gavenčiak · Jan Kulveit · Yee Whye Teh · Leonid Chindelevitch · Yarin Gal -
2020 Spotlight: How Robust are the Estimated Effects of Nonpharmaceutical Interventions against COVID-19? »
Mrinank Sharma · Sören Mindermann · Jan Brauner · Gavin Leech · Anna Stephenson · Tomáš Gavenčiak · Jan Kulveit · Yee Whye Teh · Leonid Chindelevitch · Yarin Gal -
2020 Poster: Stochastic Segmentation Networks: Modelling Spatially Correlated Aleatoric Uncertainty »
Miguel Monteiro · Loic Le Folgoc · Daniel Coelho de Castro · Nick Pawlowski · Bernardo Marques · Konstantinos Kamnitsas · Mark van der Wilk · Ben Glocker -
2019 : Break / Poster Session 1 »
Antonia Marcu · Yao-Yuan Yang · Pascale Gourdeau · Chen Zhu · Thodoris Lykouris · Jianfeng Chi · Mark Kozdoba · Arjun Nitin Bhagoji · Xiaoxia Wu · Jay Nandy · Michael T Smith · Bingyang Wen · Yuege Xie · Konstantinos Pitas · Suprosanna Shit · Maksym Andriushchenko · Dingli Yu · Gaël Letarte · Misha Khodak · Hussein Mozannar · Chara Podimata · James Foulds · Yizhen Wang · Huishuai Zhang · Ondrej Kuzelka · Alexander Levine · Nan Lu · Zakaria Mhammedi · Paul Viallard · Diana Cai · Lovedeep Gondara · James Lucas · Yasaman Mahdaviyeh · Aristide Baratin · Rishi Bommasani · Alessandro Barp · Andrew Ilyas · Kaiwen Wu · Jens Behrmann · Omar Rivasplata · Amir Nazemi · Aditi Raghunathan · Will Stephenson · Sahil Singla · Akhil Gupta · YooJung Choi · Yannic Kilcher · Clare Lyle · Edoardo Manino · Andrew Bennett · Zhi Xu · Niladri Chatterji · Emre Barut · Flavien Prost · Rodrigo Toro Icarte · Arno Blaas · Chulhee Yun · Sahin Lale · YiDing Jiang · Tharun Kumar Reddy Medini · Ashkan Rezaei · Alexander Meinke · Stephen Mell · Gary Kazantsev · Shivam Garg · Aradhana Sinha · Vishnu Lokhande · Geovani Rizk · Han Zhao · Aditya Kumar Akash · Jikai Hou · Ali Ghodsi · Matthias Hein · Tyler Sypherd · Yichen Yang · Anastasia Pentina · Pierre Gillot · Antoine Ledent · Guy Gur-Ari · Noah MacAulay · Tianzong Zhang -
2019 : Poster Session »
Gergely Flamich · Shashanka Ubaru · Charles Zheng · Josip Djolonga · Kristoffer Wickstrøm · Diego Granziol · Konstantinos Pitas · Jun Li · Robert Williamson · Sangwoong Yoon · Kwot Sin Lee · Julian Zilly · Linda Petrini · Ian Fischer · Zhe Dong · Alexander Alemi · Bao-Ngoc Nguyen · Rob Brekelmans · Tailin Wu · Aditya Mahajan · Alexander Li · Kirankumar Shiragur · Yair Carmon · Linara Adilova · SHIYU LIU · Bang An · Sanjeeb Dash · Oktay Gunluk · Arya Mazumdar · Mehul Motani · Julia Rosenzweig · Michael Kamp · Marton Havasi · Leighton P Barnes · Zhengqing Zhou · Yi Hao · Dylan Foster · Yuval Benjamini · Nati Srebro · Michael Tschannen · Paul Rubenstein · Sylvain Gelly · John Duchi · Aaron Sidford · Robin Ru · Stefan Zohren · Murtaza Dalal · Michael A Osborne · Stephen J Roberts · Moses Charikar · Jayakumar Subramanian · Xiaodi Fan · Max Schwarzer · Nicholas Roberts · Simon Lacoste-Julien · Vinay Prabhu · Aram Galstyan · Greg Ver Steeg · Lalitha Sankar · Yung-Kyun Noh · Gautam Dasarathy · Frank Park · Ngai-Man (Man) Cheung · Ngoc-Trung Tran · Linxiao Yang · Ben Poole · Andrea Censi · Tristan Sylvain · R Devon Hjelm · Bangjie Liu · Jose Gallego-Posada · Tyler Sypherd · Kai Yang · Jan Nikolas Morshuis -
2019 Workshop: Bayesian Deep Learning »
Yarin Gal · José Miguel Hernández-Lobato · Christos Louizos · Eric Nalisnick · Zoubin Ghahramani · Kevin Murphy · Max Welling -
2019 Poster: BatchBALD: Efficient and Diverse Batch Acquisition for Deep Bayesian Active Learning »
Andreas Kirsch · Joost van Amersfoort · Yarin Gal -
2019 Poster: A Geometric Perspective on Optimal Representations for Reinforcement Learning »
Marc Bellemare · Will Dabney · Robert Dadashi · Adrien Ali Taiga · Pablo Samuel Castro · Nicolas Le Roux · Dale Schuurmans · Tor Lattimore · Clare Lyle -
2018 : TBC 15 »
Yarin Gal -
2018 : Invited Speaker #5 Yarin Gal »
Yarin Gal -
2018 Workshop: Bayesian Deep Learning »
Yarin Gal · José Miguel Hernández-Lobato · Christos Louizos · Andrew Wilson · Zoubin Ghahramani · Kevin Murphy · Max Welling -
2018 : Opening Remarks »
Yarin Gal -
2018 Poster: BRUNO: A Deep Recurrent Model for Exchangeable Data »
Iryna Korshunova · Jonas Degrave · Ferenc Huszar · Yarin Gal · Arthur Gretton · Joni Dambre -
2017 : Fast Information-theoretic Bayesian Optimisation »
Robin Ru -
2017 Workshop: Bayesian Deep Learning »
Yarin Gal · José Miguel Hernández-Lobato · Christos Louizos · Andrew Wilson · Andrew Wilson · Diederik Kingma · Zoubin Ghahramani · Kevin Murphy · Max Welling -
2017 Poster: Concrete Dropout »
Yarin Gal · Jiri Hron · Alex Kendall -
2017 Poster: What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision? »
Alex Kendall · Yarin Gal -
2017 Spotlight: What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision? »
Alex Kendall · Yarin Gal -
2017 Poster: Real Time Image Saliency for Black Box Classifiers »
Piotr Dabkowski · Yarin Gal -
2016 : Panel Discussion »
Shakir Mohamed · David Blei · Ryan Adams · José Miguel Hernández-Lobato · Ian Goodfellow · Yarin Gal -
2016 Workshop: Bayesian Deep Learning »
Yarin Gal · Christos Louizos · Zoubin Ghahramani · Kevin Murphy · Max Welling -
2016 Poster: A Theoretically Grounded Application of Dropout in Recurrent Neural Networks »
Yarin Gal · Zoubin Ghahramani -
2014 Poster: Distributed Variational Inference in Sparse Gaussian Process Regression and Latent Variable Models »
Yarin Gal · Mark van der Wilk · Carl Edward Rasmussen