Timezone: »
Averaging predictions of a deep ensemble of networks is a popular and effective method to improve predictive performance and calibration in various benchmarks and Kaggle competitions.However, the runtime and training cost of deep ensembles grow linearly with the size of the ensemble, making them unsuitable for many applications.Averaging ensemble weights instead of predictions circumvents this disadvantage during inference and is typically applied to intermediate checkpoints of a model to reduce training cost. Albeit effective, only few works have improved the understanding and the performance of weight averaging.Here, we revisit this approach and show that a simple weight fusion (WF) strategy can lead to a significantly improved predictive performance and calibration. We describe what prerequisites the weights must meet in terms of weight space, functional space and loss. Furthermore, we present a new test method (called oracle test) to measure the functional space between weights. We demonstrate the versatility of our WF strategy across state of the art segmentation CNNs and Transformers as well as real world datasets such as BDD100K and Cityscapes. We compare WF with similar approaches and show our superiority for in- and out-of-distribution data in terms of predictive performance and calibration.
Author Information
Timo Saemann (Valeo)
Ahmed Hammam (Opel Automobile GmbH)
Andrei Bursuc (Valeo)
Christoph Stiller (Institute of Measurement and Control Systems, Karlsruhe Institute of Technology (KIT))
Horst-Michael Gross (Ilmenau University of Technology)
More from the Same Authors
-
2021 : Spherical Perspective on Learning with Normalization Layers »
Simon Roburin · Yann de Mont-Marin · Andrei Bursuc · Renaud Marlet · Patrick Pérez · Mathieu Aubry -
2021 : Spherical Perspective on Learning with Normalization Layers »
Simon Roburin · Yann de Mont-Marin · Andrei Bursuc · Renaud Marlet · Patrick Pérez · Mathieu Aubry -
2022 : Multi-Modal 3D GAN for Urban Scenes »
Loïck Chambon · Mickael Chen · Tuan-Hung VU · Alexandre Boulch · Andrei Bursuc · Matthieu Cord · Patrick Pérez -
2022 : Instance-Aware Observer Network for Out-of-Distribution Object Segmentation »
Victor Besnier · Andrei Bursuc · Alexandre Briot · David Picard -
2023 Poster: POP-3D: Open-Vocabulary 3D Occupancy Prediction from Images »
Antonín Vobecký · Oriane Siméoni · David Hurych · Spyridon Gidaris · Andrei Bursuc · Patrick Pérez · Josef Sivic