Timezone: »
In many real-world scenarios, data to train machine learning models becomes available over time. Unfortunately, these models struggle to continually learn new concepts without forgetting what has been learnt in the past. This phenomenon is known as catastrophic forgetting and it is difficult to prevent due to practical constraints. For instance, the amount of data that can be stored or the computational resources that can be used might be limited. Moreover, applications increasingly rely on large pre-trained neural networks, such as pre-trained Transformers, since compute or data might not be available in sufficiently large quantities to practitioners to train from scratch. In this paper, we devise a method to incrementally train a model on a sequence of tasks using pre-trained Transformers and extending them with Adapters. Different than the existing approaches, our method is able to scale to a large number of tasks without significant overhead and allows sharing information across tasks. On both image and text classification tasks, we empirically demonstrate that our method maintains a good predictive performance without retraining the model or increasing the number of model parameters over time. The resulting model is also significantly faster at inference time compared to Adapter-based state-of-the-art methods.
Author Information
Beyza Ermis (Amazon)
Giovanni Zappella (Amazon Development Center Germany)
Martin Wistuba (Amazon)
Aditya Rawal (Amazon AWS AI LABS)
Cedric Archambeau (Amazon Web Services)
More from the Same Authors
-
2020 : Synthetic Petri Dish: A Novel Surrogate Model for Rapid Architecture Search »
Aditya Rawal -
2021 : HPO-B: A Large-Scale Reproducible Benchmark for Black-Box HPO based on OpenML »
Sebastian Pineda Arango · Hadi Jomaa · Martin Wistuba · Josif Grabocka -
2022 : Differentially Private Gradient Boosting on Linear Learners for Tabular Data »
Saeyoung Rho · Shuai Tang · Sergul Aydore · Michael Kearns · Aaron Roth · Yu-Xiang Wang · Steven Wu · Cedric Archambeau -
2022 Spotlight: Supervising the Multi-Fidelity Race of Hyperparameter Configurations »
Martin Wistuba · Arlind Kadra · Josif Grabocka -
2022 Spotlight: Lightning Talks 3B-1 »
Tianying Ji · Tongda Xu · Giulia Denevi · Aibek Alanov · Martin Wistuba · Wei Zhang · Yuesong Shen · Massimiliano Pontil · Vadim Titov · Yan Wang · Yu Luo · Daniel Cremers · Yanjun Han · Arlind Kadra · Dailan He · Josif Grabocka · Zhengyuan Zhou · Fuchun Sun · Carlo Ciliberto · Dmitry Vetrov · Mingxuan Jing · Chenjian Gao · Aaron Flores · Tsachy Weissman · Han Gao · Fengxiang He · Kunzan Liu · Wenbing Huang · Hongwei Qin -
2022 Poster: Supervising the Multi-Fidelity Race of Hyperparameter Configurations »
Martin Wistuba · Arlind Kadra · Josif Grabocka -
2022 Poster: Private Synthetic Data for Multitask Learning and Marginal Queries »
Giuseppe Vietri · Cedric Archambeau · Sergul Aydore · William Brown · Michael Kearns · Aaron Roth · Ankit Siva · Shuai Tang · Steven Wu -
2020 Session: Orals & Spotlights Track 16: Continual/Meta/Misc Learning »
Laurent Charlin · Cedric Archambeau -
2019 : Poster Session »
Eduard Gorbunov · Alexandre d'Aspremont · Lingxiao Wang · Liwei Wang · Boris Ginsburg · Alessio Quaglino · Camille Castera · Saurabh Adya · Diego Granziol · Rudrajit Das · Raghu Bollapragada · Fabian Pedregosa · Martin Takac · Majid Jahani · Sai Praneeth Karimireddy · Hilal Asi · Balint Daroczy · Leonard Adolphs · Aditya Rawal · Nicolas Brandt · Minhan Li · Giuseppe Ughi · Orlando Romero · Ivan Skorokhodov · Damien Scieur · Kiwook Bae · Konstantin Mishchenko · Rohan Anil · Vatsal Sharan · Aditya Balu · Chao Chen · Zhewei Yao · Tolga Ergen · Paul Grigas · Chris Junchi Li · Jimmy Ba · Stephen J Roberts · Sharan Vaswani · Armin Eftekhari · Chhavi Sharma -
2018 : From Nodes to Networks: Evolving Recurrent Neural Networks »
Aditya Rawal -
2017 : Industry talk: Cedric Archambeau (TBA) »
Cedric Archambeau -
2014 Workshop: Learning Semantics »
Cedric Archambeau · Antoine Bordes · Leon Bottou · Chris J Burges · David Grangier -
2011 Workshop: Choice Models and Preference Learning »
Jean-Marc Andreoli · Cedric Archambeau · Guillaume Bouchard · Shengbo Guo · Kristian Kersting · Scott Sanner · Martin Szummer · Paolo Viappiani · Onno Zoeter -
2011 Session: Spotlight Session 7 »
Cedric Archambeau -
2011 Session: Oral Session 9 »
Cedric Archambeau -
2011 Poster: Sparse Bayesian Multi-Task Learning »
Cedric Archambeau · Shengbo Guo · Onno Zoeter -
2008 Poster: Sparse probabilistic projections »
Cedric Archambeau · Francis Bach -
2008 Spotlight: Sparse probabilistic projections »
Cedric Archambeau · Francis Bach -
2007 Poster: Variational Inference for Diffusion Processes »
Cedric Archambeau · Manfred Opper · Yuan Shen · Dan Cornford · John Shawe-Taylor -
2006 Workshop: Dynamical Systems, Stochastic Processes and Bayesian Inference »
Manfred Opper · Cedric Archambeau · John Shawe-Taylor