Timezone: »
Sparsely-activated neural networks with conditional computation learn to route their inputs through different subnetworks, providing a strong structural prior and reducing computational costs.Despite their possible benefits, models with learned routing often underperform their parameter-matched densely-activated counterparts as well as models that use non-learned heuristic routing strategies.In this paper, we hypothesize that these shortcomings stem from the gradient estimation techniques used to train sparsely-activated models with non-differentiable discrete routing decisions.To test this hypothesis, we evaluate the performance of sparsely-activated models trained with various gradient estimation techniques in three settings where a high-quality heuristic routing strategy can be designed.Our experiments reveal that learned routing reaches substantially worse solutions than heuristic routing in various settings.As a first step towards remedying this gap, we demonstrate that supervising the routing decision on a small fraction of the examples is sufficient to help the model to learn better routing strategies. Our results shed light on the difficulties of learning effective routing and set the stage for future work on conditional computation mechanisms and training techniques.
Author Information
Mohammed Muqeeth (University of North Carolina at Chapel Hill)
I am interested in applying machine learning to solve NLP tasks efficiently
Haokun Liu (Department of Computer Science, University of North Carolina, Chapel Hill)
Colin Raffel (UNC Chapel Hill and Hugging Face)
More from the Same Authors
-
2021 : The impact of domain shift on the calibration of fine-tuned models »
Jay Mohta · Colin Raffel -
2023 Poster: Resolving Interference When Merging Models »
Prateek Yadav · Derek Tam · Leshem Choshen · Colin Raffel · Mohit Bansal -
2023 Poster: Scaling Data-Constrained Language Models »
Niklas Muennighoff · Alexander Rush · Boaz Barak · Teven Le Scao · Nouamane Tazi · Aleksandra Piktus · Thomas Wolf · Colin Raffel · Sampo Pyysalo -
2023 Poster: Distributed Inference and Fine-tuning of Large Language Models Over The Internet »
Alexander Borzunov · Dmitry Baranchuk · Tim Dettmers · Max Ryabinin · Younes Belkada · Artem Chumachenko · Pavel Samygin · Colin Raffel -
2023 Poster: Improving Few-Shot Generalization by Exploring and Exploiting Auxiliary Data »
Alon Albalak · Colin Raffel · William Yang Wang -
2023 Oral: Scaling Data-Constrained Language Models »
Niklas Muennighoff · Alexander Rush · Boaz Barak · Teven Le Scao · Nouamane Tazi · Aleksandra Piktus · Thomas Wolf · Colin Raffel · Sampo Pyysalo -
2022 : Petals: Collaborative Inference and Fine-tuning of Large Models »
Alexander Borzunov · Dmitry Baranchuk · Tim Dettmers · Max Ryabinin · Younes Belkada · Artem Chumachenko · Pavel Samygin · Colin Raffel -
2022 : Petals: Collaborative Inference and Fine-tuning of Large Models »
Alexander Borzunov · Dmitry Baranchuk · Tim Dettmers · Max Ryabinin · Younes Belkada · Artem Chumachenko · Pavel Samygin · Colin Raffel -
2022 Workshop: Transfer Learning for Natural Language Processing »
Alon Albalak · Colin Raffel · Chunting Zhou · Deepak Ramachandran · Xuezhe Ma · Sebastian Ruder -
2022 Poster: Compositional Generalization in Unsupervised Compositional Representation Learning: A Study on Disentanglement and Emergent Language »
Zhenlin Xu · Marc Niethammer · Colin Raffel -
2022 Poster: A Combinatorial Perspective on the Optimization of Shallow ReLU Networks »
Michael S Matena · Colin Raffel -
2022 Poster: Merging Models with Fisher-Weighted Averaging »
Michael S Matena · Colin Raffel -
2022 Poster: Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning »
Haokun Liu · Derek Tam · Mohammed Muqeeth · Jay Mohta · Tenghao Huang · Mohit Bansal · Colin Raffel -
2021 Poster: Training Neural Networks with Fixed Sparse Masks »
Yi-Lin Sung · Varun Nair · Colin Raffel -
2020 : Responsible publication: NLP case study »
Miles Brundage · Bryan McCann · Colin Raffel · Natalie Schulter · Zeerak Waseem · Rosie Campbell