Timezone: »

The Unreasonable Effectiveness of Fully-Connected Layers for Low-Data Regimes
Peter Kocsis · Peter Súkeník · Guillem Braso · Matthias Niessner · Laura Leal-Taixé · Ismail Elezi

Tue Nov 29 09:00 AM -- 11:00 AM (PST) @ Hall J #125
Convolutional neural networks were the standard for solving many computer vision tasks until recently, when Transformers of MLP-based architectures have started to show competitive performance. These architectures typically have a vast number of weights and need to be trained on massive datasets; hence, they are not suitable for their use in low-data regimes. In this work, we propose a simple yet effective framework to improve generalization from small amounts of data. We augment modern CNNs with fully-connected (FC) layers and show the massive impact this architectural change has in low-data regimes. We further present an online joint knowledge-distillation method to utilize the extra FC layers at train time but avoid them during test time. This allows us to improve the generalization of a CNN-based model without any increase in the number of weights at test time. We perform classification experiments for a large range of network backbones and several standard datasets on supervised learning and active learning. Our experiments significantly outperform the networks without fully-connected layers, reaching a relative improvement of up to $16\%$ validation accuracy in the supervised setting without adding any extra parameters during inference.

Author Information

Peter Kocsis (Department of Informatics, Technische Universität München)
Peter Súkeník (Institute of Science and Technology Austria)
Guillem Braso (Technical University Munich)
Matthias Niessner (Technical University of Munich)
Laura Leal-Taixé (TUM)
Ismail Elezi (Technical University of Munich)

More from the Same Authors