Timezone: »

Practical Deep Learning with Bayesian Principles
Kazuki Osawa · Siddharth Swaroop · Mohammad Emtiyaz Khan · Anirudh Jain · Runa Eschenhagen · Richard Turner · Rio Yokota

Tue Dec 10 05:30 PM -- 07:30 PM (PST) @ East Exhibition Hall B + C #168

Bayesian methods promise to fix many shortcomings of deep learning, but they are impractical and rarely match the performance of standard methods, let alone improve them. In this paper, we demonstrate practical training of deep networks with natural-gradient variational inference. By applying techniques such as batch normalisation, data augmentation, and distributed training, we achieve similar performance in about the same number of epochs as the Adam optimiser, even on large datasets such as ImageNet. Importantly, the benefits of Bayesian principles are preserved: predictive probabilities are well-calibrated, uncertainties on out-of-distribution data are improved, and continual-learning performance is boosted. This work enables practical deep learning while preserving benefits of Bayesian principles. A PyTorch implementation is available as a plug-and-play optimiser.

Author Information

Kazuki Osawa (Tokyo Institute of Technology)
Siddharth Swaroop (University of Cambridge)
Mohammad Emtiyaz Khan (RIKEN)

Emtiyaz Khan (also known as Emti) is a team leader at the RIKEN center for Advanced Intelligence Project (AIP) in Tokyo where he leads the Approximate Bayesian Inference Team. He is also a visiting professor at the Tokyo University of Agriculture and Technology (TUAT). Previously, he was a postdoc and then a scientist at Ecole Polytechnique Fédérale de Lausanne (EPFL), where he also taught two large machine learning courses and received a teaching award. He finished his PhD in machine learning from University of British Columbia in 2012. The main goal of Emti’s research is to understand the principles of learning from data and use them to develop algorithms that can learn like living beings. For the past 10 years, his work has focused on developing Bayesian methods that could lead to such fundamental principles. The approximate Bayesian inference team now continues to use these principles, as well as derive new ones, to solve real-world problems.

Anirudh Jain (Indian Institute of Technology (ISM), Dhanbad)
Runa Eschenhagen (University of Osnabrueck)
Richard Turner (University of Cambridge)
Rio Yokota (Tokyo Institute of Technology, AIST- Tokyo Tech Real World Big-Data Computation Open Innovation Laboratory (RWBC- OIL), National Institute of Advanced Industrial Science and Technology (AIST))

Rio Yokota received his BS, MS, and PhD from Keio University in 2003, 2005, and 2009, respectively. He is currently an Associate Professor at GSIC, Tokyo Institute of Technology. His research interests range from high performance computing, hierarchical low-rank approximation methods, and scalable deep learning. He was part of the team that won the ACM Gordon Bell prize for price/performance in 2009.

More from the Same Authors