Meta-learning for few-shot classification has been challenged on its effectiveness compared to simpler pretraining methods and the validity of its claim of "learning to learn". Recent work has suggested that MAML-based models do not perform "rapid-learning" in the inner-loop but reuse features by only adapting the final linear layer. Separately, BatchNorm, a near ubiquitous inclusion in model architectures, has been shown to have an implicit learning rate decay effect on the preceding layers of a network. We study the impact of BatchNorm's implicit learning rate decay on feature reuse in meta-learning methods and find that counteracting it increases change in intermediate layers during adaptation. We also find that counteracting this learning rate decay sometimes improves performance on few-shot classification tasks.
Alexander Wang (University of Toronto)
Sasha (Alexandre) Doubov (University of Toronto)
Gary Leung (University of Toronto)
More from the Same Authors
2021 Poster: Scalable Neural Data Server: A Data Recommender for Transfer Learning »
Tianshi Cao · Sasha (Alexandre) Doubov · David Acuna · Sanja Fidler