Timezone: »
Variational Bayesian posterior inference often requires simplifying approximations such as mean-field parametrisation to ensure tractability. However, prior work has associated the variational mean-field approximation for Bayesian neural networks with underfitting in the case of small datasets or large model sizes. In this work, we show that invariances in the likelihood function of over-parametrised models contribute to this phenomenon because these invariances complicate the structure of the posterior by introducing discrete and/or continuous modes which cannot be well approximated by Gaussian mean-field distributions. In particular, we show that the mean-field approximation has an additional gap in the evidence lower bound compared to a purpose-built posterior that takes into account the known invariances. Importantly, this invariance gap is not constant; it vanishes as the approximation reverts to the prior. We proceed by first considering translation invariances in a linear model with a single data point in detail. We show that, while the true posterior can be constructed from a mean-field parametrisation, this is achieved only if the objective function takes into account the invariance gap. Then, we transfer our analysis of the linear model to neural networks. Our analysis provides a framework for future work to explore solutions to the invariance problem.
Author Information
Richard Kurle (Technical University of Munich)
Ralf Herbrich (Hasso Plattner Institute)

I am Professor and Chair for Artificial Intelligence and Sustainability research group at the Hasso-Plattner institute.
Tim Januschowski (Zalando SE)
- Director Pricing Platform, Zalando SE - Head of Time Series ML at AWS AI
Yuyang (Bernie) Wang (AWS AI Labs)
Jan Gasthaus
More from the Same Authors
-
2021 : On Symmetries in Variational Bayesian Neural Nets »
Richard Kurle · Tim Januschowski · Jan Gasthaus · Bernie Wang -
2022 : First De-Trend then Attend: Rethinking Attention for Time-Series Forecasting »
Xiyuan Zhang · Xiaoyong Jin · Karthick Gopalswamy · Gaurav Gupta · Youngsuk Park · Xingjian Shi · Hao Wang · Danielle Maddix · Yuyang (Bernie) Wang -
2022 : Towards Reverse Causal Inference on Panel Data: Precise Formulation and Challenges »
Jiayao Zhang · Youngsuk Park · Danielle Maddix · Dan Roth · Yuyang (Bernie) Wang -
2022 : But Are You Sure? Quantifying Uncertainty in Model Explanations »
Charles Marx · Youngsuk Park · Hilaf Hasson · Yuyang (Bernie) Wang · Stefano Ermon · Chaitanya Baru -
2022 Spotlight: Lightning Talks 4A-3 »
Zhihan Gao · Yabin Wang · Xingyu Qu · Luziwei Leng · Mingqing Xiao · Bohan Wang · Yu Shen · Zhiwu Huang · Xingjian Shi · Qi Meng · Yupeng Lu · Diyang Li · Qingyan Meng · Kaiwei Che · Yang Li · Hao Wang · Huishuai Zhang · Zongpeng Zhang · Kaixuan Zhang · Xiaopeng Hong · Xiaohan Zhao · Di He · Jianguo Zhang · Yaofeng Tu · Bin Gu · Yi Zhu · Ruoyu Sun · Yuyang (Bernie) Wang · Zhouchen Lin · Qinghu Meng · Wei Chen · Wentao Zhang · Bin CUI · Jie Cheng · Zhi-Ming Ma · Mu Li · Qinghai Guo · Dit-Yan Yeung · Tie-Yan Liu · Jianxing Liao -
2022 Spotlight: Earthformer: Exploring Space-Time Transformers for Earth System Forecasting »
Zhihan Gao · Xingjian Shi · Hao Wang · Yi Zhu · Yuyang (Bernie) Wang · Mu Li · Dit-Yan Yeung -
2022 Workshop: A causal view on dynamical systems »
Sören Becker · Alexis Bellot · Cecilia Casolo · Niki Kilbertus · Sara Magliacane · Yuyang (Bernie) Wang -
2022 Poster: Earthformer: Exploring Space-Time Transformers for Earth System Forecasting »
Zhihan Gao · Xingjian Shi · Hao Wang · Yi Zhu · Yuyang (Bernie) Wang · Mu Li · Dit-Yan Yeung -
2021 Poster: Neural Flows: Efficient Alternative to Neural ODEs »
Marin Biloš · Johanna Sommer · Syama Sundar Rangapuram · Tim Januschowski · Stephan Günnemann -
2021 Poster: Detecting Anomalous Event Sequences with Temporal Point Processes »
Oleksandr Shchur · Ali Caner Turkmen · Tim Januschowski · Jan Gasthaus · Stephan Günnemann -
2021 Poster: Probabilistic Forecasting: A Level-Set Approach »
Hilaf Hasson · Bernie Wang · Tim Januschowski · Jan Gasthaus -
2021 Poster: Online false discovery rate control for anomaly detection in time series »
Quentin Rebjock · Baris Kurt · Tim Januschowski · Laurent Callot -
2021 Poster: Deep Explicit Duration Switching Models for Time Series »
Abdul Fatir Ansari · Konstantinos Benidis · Richard Kurle · Ali Caner Turkmen · Harold Soh · Alexander Smola · Bernie Wang · Tim Januschowski -
2021 Poster: Latent Matters: Learning Deep State-Space Models »
Alexej Klushyn · Richard Kurle · Maximilian Soelch · Botond Cseke · Patrick van der Smagt -
2020 Poster: Deep Rao-Blackwellised Particle Filters for Time Series Forecasting »
Richard Kurle · Syama Sundar Rangapuram · Emmanuel de Bézenac · Stephan Günnemann · Jan Gasthaus -
2020 Poster: Normalizing Kalman Filters for Multivariate Time Series Analysis »
Emmanuel de Bézenac · Syama Sundar Rangapuram · Konstantinos Benidis · Michael Bohlke-Schneider · Richard Kurle · Lorenzo Stella · Hilaf Hasson · Patrick Gallinari · Tim Januschowski -
2019 Poster: Learning Hierarchical Priors in VAEs »
Alexej Klushyn · Nutan Chen · Richard Kurle · Botond Cseke · Patrick van der Smagt -
2019 Spotlight: Learning Hierarchical Priors in VAEs »
Alexej Klushyn · Nutan Chen · Richard Kurle · Botond Cseke · Patrick van der Smagt