Timezone: »
Averaging the parameters of models that have the same architecture and initialization can provide a means of combining their respective capabilities. In this paper, we take the perspective that this "merging" operation can be seen as choosing parameters that approximately maximize the joint likelihood of the posteriors of the models' parameters. Computing a simple average of the models' parameters therefore corresponds to making an isotropic Gaussian approximation to their posteriors. We develop an alternative merging procedure based on the Laplace approximation where we approximate each model's posterior as a Gaussian distribution whose precision matrix corresponds to its Fisher information. We first show that our "Fisher merging" technique provides a performance boost in settings where simple parameter averaging is currently used -- specifically, robust fine-tuning and model ensembling. Then, we compare merging to standard gradient-based transfer learning and demonstrate that merging enables a fundamentally different method for transferring capabilities across models. Specifically, we show that Fisher merging is competitive with gradient-based transfer learning approaches (while being significantly cheaper) in intermediate-task training and domain-adaptive pre-training. We also show that our merging procedure makes it possible to combine models in previously unexplored ways. We release our code to facilitate future research into methods for merging models.
Author Information
Michael S Matena (University of North Carolina at Chapel Hill)
Colin Raffel (UNC Chapel Hill and Hugging Face)
More from the Same Authors
-
2021 : The impact of domain shift on the calibration of fine-tuned models »
Jay Mohta · Colin Raffel -
2022 : Models with Conditional Computation Learn Suboptimal Solutions »
Mohammed Muqeeth · Haokun Liu · Colin Raffel -
2023 Poster: Resolving Interference When Merging Models »
Prateek Yadav · Derek Tam · Leshem Choshen · Colin Raffel · Mohit Bansal -
2023 Poster: Scaling Data-Constrained Language Models »
Niklas Muennighoff · Alexander Rush · Boaz Barak · Teven Le Scao · Nouamane Tazi · Aleksandra Piktus · Thomas Wolf · Colin Raffel · Sampo Pyysalo -
2023 Poster: Distributed Inference and Fine-tuning of Large Language Models Over The Internet »
Alexander Borzunov · Dmitry Baranchuk · Tim Dettmers · Max Ryabinin · Younes Belkada · Artem Chumachenko · Pavel Samygin · Colin Raffel -
2023 Poster: Improving Few-Shot Generalization by Exploring and Exploiting Auxiliary Data »
Alon Albalak · Colin Raffel · William Yang Wang -
2023 Oral: Scaling Data-Constrained Language Models »
Niklas Muennighoff · Alexander Rush · Boaz Barak · Teven Le Scao · Nouamane Tazi · Aleksandra Piktus · Thomas Wolf · Colin Raffel · Sampo Pyysalo -
2022 : Petals: Collaborative Inference and Fine-tuning of Large Models »
Alexander Borzunov · Dmitry Baranchuk · Tim Dettmers · Max Ryabinin · Younes Belkada · Artem Chumachenko · Pavel Samygin · Colin Raffel -
2022 : Petals: Collaborative Inference and Fine-tuning of Large Models »
Alexander Borzunov · Dmitry Baranchuk · Tim Dettmers · Max Ryabinin · Younes Belkada · Artem Chumachenko · Pavel Samygin · Colin Raffel -
2022 Workshop: Transfer Learning for Natural Language Processing »
Alon Albalak · Colin Raffel · Chunting Zhou · Deepak Ramachandran · Xuezhe Ma · Sebastian Ruder -
2022 Poster: Compositional Generalization in Unsupervised Compositional Representation Learning: A Study on Disentanglement and Emergent Language »
Zhenlin Xu · Marc Niethammer · Colin Raffel -
2022 Poster: A Combinatorial Perspective on the Optimization of Shallow ReLU Networks »
Michael S Matena · Colin Raffel -
2022 Poster: Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning »
Haokun Liu · Derek Tam · Mohammed Muqeeth · Jay Mohta · Tenghao Huang · Mohit Bansal · Colin Raffel -
2021 Poster: Training Neural Networks with Fixed Sparse Masks »
Yi-Lin Sung · Varun Nair · Colin Raffel -
2020 : Responsible publication: NLP case study »
Miles Brundage · Bryan McCann · Colin Raffel · Natalie Schulter · Zeerak Waseem · Rosie Campbell