Timezone: »
Adapting large-scale pretrained language models to downstream tasks via fine-tuning is the standard method for achieving state-of-the-art performance on NLP benchmarks. However, fine-tuning all weights of models with millions or billions of parameters is sample-inefficient, unstable in low-resource settings, and wasteful as it requires storing a separate copy of the model for each task. Recent work has developed parameter-efficient fine-tuning methods, but these approaches either still require a relatively large number of parameters or underperform standard fine-tuning. In this work, we propose Compacter, a method for fine-tuning large-scale language models with a better trade-off between task performance and the number of trainable parameters than prior work. Compacter accomplishes this by building on top of ideas from adapters, low-rank optimization, and parameterized hypercomplex multiplication layers.Specifically, Compacter inserts task-specific weight matrices into a pretrained model's weights, which are computed efficiently as a sum of Kronecker products between shared slow'' weights and
fast'' rank-one matrices defined per Compacter layer. By only training 0.047% of a pretrained model's parameters, Compacter performs on par with standard fine-tuning on GLUE and outperforms standard fine-tuning on SuperGLUE and low-resource settings. Our code is publicly available at https://github.com/rabeehk/compacter.
Author Information
Rabeeh Karimi Mahabadi (EPFL / Idiap Research Institute)
James Henderson (Idiap Research Institute)
Sebastian Ruder (DeepMind)
More from the Same Authors
-
2021 : LiRo: Benchmark and leaderboard for Romanian language tasks »
Stefan Dumitrescu · Petru Rebeja · Beata Lorincz · Mihaela Gaman · Andrei Avram · Mihai Ilie · Andrei Pruteanu · Adriana Stan · Lorena Rosia · Cristina Iacobescu · Luciana Morogan · George Dima · Gabriel Marchidan · Traian Rebedea · Madalina Chitez · Dani Yogatama · Sebastian Ruder · Radu Tudor Ionescu · Razvan Pascanu · Viorica Patraucean -
2021 Spotlight: Mind the Gap: Assessing Temporal Generalization in Neural Language Models »
Angeliki Lazaridou · Adhi Kuncoro · Elena Gribovskaya · Devang Agrawal · Adam Liska · Tayfun Terzi · Mai Gimenez · Cyprien de Masson d'Autume · Tomas Kocisky · Sebastian Ruder · Dani Yogatama · Kris Cao · Susannah Barlow · Phil Blunsom -
2022 Workshop: Transfer Learning for Natural Language Processing »
Alon Albalak · Colin Raffel · Chunting Zhou · Deepak Ramachandran · Xuezhe Ma · Sebastian Ruder -
2021 Poster: Mind the Gap: Assessing Temporal Generalization in Neural Language Models »
Angeliki Lazaridou · Adhi Kuncoro · Elena Gribovskaya · Devang Agrawal · Adam Liska · Tayfun Terzi · Mai Gimenez · Cyprien de Masson d'Autume · Tomas Kocisky · Sebastian Ruder · Dani Yogatama · Kris Cao · Susannah Barlow · Phil Blunsom -
2019 Poster: Episodic Memory in Lifelong Language Learning »
Cyprien de Masson d'Autume · Sebastian Ruder · Lingpeng Kong · Dani Yogatama