Timezone: »
Ensembles over neural network weights trained from different random initialization, known as deep ensembles, achieve state-of-the-art accuracy and calibration. The recently introduced batch ensembles provide a drop-in replacement that is more parameter efficient. In this paper, we design ensembles not only over weights, but over hyperparameters to improve the state of the art in both settings. For best performance independent of budget, we propose hyper-deep ensembles, a simple procedure that involves a random search over different hyperparameters, themselves stratified across multiple random initializations. Its strong performance highlights the benefit of combining models with both weight and hyperparameter diversity. We further propose a parameter efficient version, hyper-batch ensembles, which builds on the layer structure of batch ensembles and self-tuning networks. The computational and memory costs of our method are notably lower than typical ensembles. On image classification tasks, with MLP, LeNet, ResNet 20 and Wide ResNet 28-10 architectures, we improve upon both deep and batch ensembles.
Author Information
Florian Wenzel (---)
Jasper Snoek (Google Research, Brain team)
Jasper Snoek is a research scientist at Google Brain. His research has touched a variety of topics at the intersection of Bayesian methods and deep learning. He completed his PhD in machine learning at the University of Toronto. He subsequently held postdoctoral fellowships at the University of Toronto, under Geoffrey Hinton and Ruslan Salakhutdinov, and at the Harvard Center for Research on Computation and Society, under Ryan Adams. Jasper co-founded a Bayesian optimization focused startup, Whetlab, which was acquired by Twitter. He has served as an Area Chair for NeurIPS, ICML, AISTATS and ICLR, and organized a variety of workshops at ICML and NeurIPS.
Dustin Tran (Google Brain)
Rodolphe Jenatton (Google Brain)
More from the Same Authors
-
2021 : Benchmarking Bayesian Deep Learning on Diabetic Retinopathy Detection Tasks »
Neil Band · Tim G. J. Rudner · Qixuan Feng · Angelos Filos · Zachary Nado · Mike Dusenberry · Ghassen Jerfel · Dustin Tran · Yarin Gal -
2021 : Benchmarking Bayesian Deep Learning on Diabetic Retinopathy Detection Tasks »
Neil Band · Tim G. J. Rudner · Qixuan Feng · Angelos Filos · Zachary Nado · Mike Dusenberry · Ghassen Jerfel · Dustin Tran · Yarin Gal -
2021 : Uncertainty Baselines: Benchmarks for Uncertainty & Robustness in Deep Learning »
Zachary Nado · Neil Band · Mark Collier · Josip Djolonga · Mike Dusenberry · Sebastian Farquhar · Qixuan Feng · Angelos Filos · Marton Havasi · Rodolphe Jenatton · Ghassen Jerfel · Jeremiah Liu · Zelda Mariet · Jeremy Nixon · Shreyas Padhy · Jie Ren · Tim G. J. Rudner · Yeming Wen · Florian Wenzel · Kevin Murphy · D. Sculley · Balaji Lakshminarayanan · Jasper Snoek · Yarin Gal · Dustin Tran -
2021 : Deep Classifiers with Label Noise Modeling and Distance Awareness »
Vincent Fortuin · Mark Collier · Florian Wenzel · James Allingham · Jeremiah Liu · Dustin Tran · Balaji Lakshminarayanan · Jesse Berent · Rodolphe Jenatton · Effrosyni Kokiopoulou -
2021 : Benchmarking Bayesian Deep Learning on Diabetic Retinopathy Detection Tasks »
Neil Band · Tim G. J. Rudner · Qixuan Feng · Angelos Filos · Zachary Nado · Mike Dusenberry · Ghassen Jerfel · Dustin Tran · Yarin Gal -
2022 : Reliability benchmarks for image segmentation »
Estefany Kelly Buchanan · Michael Dusenberry · Jie Ren · Kevin Murphy · Balaji Lakshminarayanan · Dustin Tran -
2022 : Panel Discussion »
Jacob Gardner · Marta Blangiardo · Viacheslav Borovitskiy · Jasper Snoek · Paula Moraga · Carolina Osorio -
2022 : Invited Talk: Jasper Snoek »
Jasper Snoek -
2022 Poster: On the Adversarial Robustness of Mixture of Experts »
Joan Puigcerver · Rodolphe Jenatton · Carlos Riquelme · Pranjal Awasthi · Srinadh Bhojanapalli -
2022 Poster: Multimodal Contrastive Learning with LIMoE: the Language-Image Mixture of Experts »
Basil Mustafa · Carlos Riquelme · Joan Puigcerver · Rodolphe Jenatton · Neil Houlsby -
2021 : Benchmarking Bayesian Deep Learning on Diabetic Retinopathy Detection Tasks »
Neil Band · Tim G. J. Rudner · Qixuan Feng · Angelos Filos · Zachary Nado · Mike Dusenberry · Ghassen Jerfel · Dustin Tran · Yarin Gal -
2021 Poster: Soft Calibration Objectives for Neural Networks »
Archit Karandikar · Nicholas Cain · Dustin Tran · Balaji Lakshminarayanan · Jonathon Shlens · Michael Mozer · Becca Roelofs -
2021 Poster: Scaling Vision with Sparse Mixture of Experts »
Carlos Riquelme · Joan Puigcerver · Basil Mustafa · Maxim Neumann · Rodolphe Jenatton · AndrĂ© Susano Pinto · Daniel Keysers · Neil Houlsby -
2021 Poster: Revisiting the Calibration of Modern Neural Networks »
Matthias Minderer · Josip Djolonga · Rob Romijnders · Frances Hubis · Xiaohua Zhai · Neil Houlsby · Dustin Tran · Mario Lucic -
2020 Session: Orals & Spotlights Track 33: Health/AutoML/(Soft|Hard)ware »
Dustin Tran · Artur Dubrawski -
2020 Poster: Simple and Principled Uncertainty Estimation with Deterministic Deep Learning via Distance Awareness »
Jeremiah Liu · Zi Lin · Shreyas Padhy · Dustin Tran · Tania Bedrax Weiss · Balaji Lakshminarayanan -
2020 Tutorial: (Track2) Practical Uncertainty Estimation and Out-of-Distribution Robustness in Deep Learning Q&A »
Dustin Tran · Balaji Lakshminarayanan · Jasper Snoek -
2020 Poster: A Spectral Energy Distance for Parallel Speech Synthesis »
Alexey Gritsenko · Tim Salimans · Rianne van den Berg · Jasper Snoek · Nal Kalchbrenner -
2020 Session: Orals & Spotlights Track 19: Probabilistic/Causality »
Julie Josse · Jasper Snoek -
2020 Tutorial: (Track2) Practical Uncertainty Estimation and Out-of-Distribution Robustness in Deep Learning »
Dustin Tran · Balaji Lakshminarayanan · Jasper Snoek -
2019 Poster: Learning search spaces for Bayesian optimization: Another view of hyperparameter transfer learning »
Valerio Perrone · Huibin Shen · Matthias Seeger · Cedric Archambeau · Rodolphe Jenatton -
2019 Poster: Bayesian Layers: A Module for Neural Network Uncertainty »
Dustin Tran · Mike Dusenberry · Mark van der Wilk · Danijar Hafner -
2019 Poster: Likelihood Ratios for Out-of-Distribution Detection »
Jie Ren · Peter Liu · Emily Fertig · Jasper Snoek · Ryan Poplin · Mark Depristo · Joshua Dillon · Balaji Lakshminarayanan -
2019 Poster: Discrete Flows: Invertible Generative Models of Discrete Data »
Dustin Tran · Keyon Vafa · Kumar Agrawal · Laurent Dinh · Ben Poole -
2019 Poster: DppNet: Approximating Determinantal Point Processes with Deep Networks »
Zelda Mariet · Yaniv Ovadia · Jasper Snoek -
2018 : Software Panel »
Ben Letham · David Duvenaud · Dustin Tran · Aki Vehtari -
2018 Poster: Scalable Hyperparameter Transfer Learning »
Valerio Perrone · Rodolphe Jenatton · Matthias W Seeger · Cedric Archambeau -
2018 Poster: Autoconj: Recognizing and Exploiting Conjugacy Without a Domain-Specific Language »
Matthew D. Hoffman · Matthew Johnson · Dustin Tran -
2018 Poster: Simple, Distributed, and Accelerated Probabilistic Programming »
Dustin Tran · Matthew Hoffman · Dave Moore · Christopher Suter · Srinivas Vasudevan · Alexey Radul · Matthew Johnson · Rif A. Saurous -
2018 Poster: Mesh-TensorFlow: Deep Learning for Supercomputers »
Noam Shazeer · Youlong Cheng · Niki Parmar · Dustin Tran · Ashish Vaswani · Penporn Koanantakool · Peter Hawkins · HyoukJoong Lee · Mingsheng Hong · Cliff Young · Ryan Sepassi · Blake Hechtman -
2017 : Lessons learned from designing Edward »
Dustin Tran -
2017 : Deep Probabilistic Programming »
Dustin Tran -
2017 : Contributed talk: Scalable Logit Gaussian Process Classification »
Florian Wenzel -
2017 : Contributed talk 3: Implicit Causal Models for Genome-wide Association Studies »
Dustin Tran -
2017 : Introduction »
Cheng Zhang · Francisco Ruiz · Dustin Tran · James McInerney · Stephan Mandt -
2017 Workshop: Advances in Approximate Bayesian Inference »
Francisco Ruiz · Stephan Mandt · Cheng Zhang · James McInerney · James McInerney · Dustin Tran · Dustin Tran · David Blei · Max Welling · Tamara Broderick · Michalis Titsias -
2017 Poster: Hierarchical Implicit Models and Likelihood-Free Variational Inference »
Dustin Tran · Rajesh Ranganath · David Blei -
2017 Poster: Variational Inference via $\chi$ Upper Bound Minimization »
Adji Bousso Dieng · Dustin Tran · Rajesh Ranganath · John Paisley · David Blei -
2016 Workshop: Advances in Approximate Bayesian Inference »
Tamara Broderick · Stephan Mandt · James McInerney · Dustin Tran · David Blei · Kevin Murphy · Andrew Gelman · Michael I Jordan -
2016 Poster: Operator Variational Inference »
Rajesh Ranganath · Dustin Tran · Jaan Altosaar · David Blei -
2015 : Variational Gaussian Process »
Dustin Tran -
2015 : Finding Sparse Features in Strongly Confounded Medial Data »
Stephan Mandt · Florian Wenzel -
2015 Workshop: Advances in Approximate Bayesian Inference »
Dustin Tran · Tamara Broderick · Stephan Mandt · James McInerney · Shakir Mohamed · Alp Kucukelbir · Matthew D. Hoffman · Neil Lawrence · David Blei -
2015 Poster: Copula variational inference »
Dustin Tran · David Blei · Edo M Airoldi -
2013 Poster: Convex Relaxations for Permutation Problems »
Fajwel Fogel · Rodolphe Jenatton · Francis Bach · Alexandre d'Aspremont -
2012 Poster: A latent factor model for highly multi-relational data »
Rodolphe Jenatton · Nicolas Le Roux · Antoine Bordes · Guillaume R Obozinski -
2010 Poster: Network Flow Algorithms for Structured Sparsity »
Julien Mairal · Rodolphe Jenatton · Guillaume R Obozinski · Francis Bach