Timezone: »
In contrast to the natural capabilities of humans to learn new tasks in a sequential fashion, neural networks are known to suffer from catastrophic forgetting, where the model's performances drop dramatically after being optimized for a new task. Since then, the continual learning community has proposed several solutions aiming to equip the neural network with the ability to learn the current task (plasticity) while still achieving high accuracy on the old tasks (stability). Despite remarkable improvements, the plasticity-stability trade-off is still far from being solved, and its underlying mechanism is poorly understood. In this work, we propose Auxiliary Network Continual Learning (ANCL), a new method that combines the continually learned model with an additional auxiliary network that is solely optimized on the new task. More concretely, the proposed framework materializes in a regularizer that naturally interpolates between plasticity and stability, surpassing strong baselines on CIFAR-100. By analyzing the solutions of several continual learning methods based on the so-called mode connectivity assumption, we propose a new hyperparamter's search technique which dynamically adjust the regularization parameter to achieve better stability-plasticity trade-off.
Author Information
Sanghwan Kim (ETHZ - ETH Zurich)
Lorenzo Noci (ETH Zürich)
Antonio Orvieto (ETH Zurich)
PhD Student at ETH Zurich. I’m interested in the design and analysis of optimization algorithms for deep learning. Interned at DeepMind, MILA, and Meta. All publications at http://orvi.altervista.org/ Looking for postdoc positions! :) antonio.orvieto@inf.ethz.ch
Thomas Hofmann (ETH Zurich)
More from the Same Authors
-
2021 Spotlight: Precise characterization of the prior predictive distribution of deep ReLU networks »
Lorenzo Noci · Gregor Bachmann · Kevin Roth · Sebastian Nowozin · Thomas Hofmann -
2022 : Batch size selection by stochastic optimal contro »
Jim Zhao · Aurelien Lucchi · Frank Proske · Antonio Orvieto · Hans Kersting -
2022 : Cosmology from Galaxy Redshift Surveys with PointNet »
Sotiris Anagnostidis · Arne Thomsen · Alexandre Refregier · Tomasz Kacprzak · Luca Biggio · Thomas Hofmann · Tilman Tröster -
2022 Poster: On the Theoretical Properties of Noise Correlation in Stochastic Optimization »
Aurelien Lucchi · Frank Proske · Antonio Orvieto · Francis Bach · Hans Kersting -
2022 Poster: OpenFilter: A Framework to Democratize Research Access to Social Media AR Filters »
Piera Riccio · Bill Psomas · Francesco Galati · Francisco Escolano · Thomas Hofmann · Nuria Oliver -
2022 Poster: Signal Propagation in Transformers: Theoretical Perspectives and the Role of Rank Collapse »
Lorenzo Noci · Sotiris Anagnostidis · Luca Biggio · Antonio Orvieto · Sidak Pal Singh · Aurelien Lucchi -
2022 Poster: Dynamics of SGD with Stochastic Polyak Stepsizes: Truly Adaptive Variants and Convergence to Exact Solution »
Antonio Orvieto · Simon Lacoste-Julien · Nicolas Loizou -
2021 : Empirics on the expressiveness of Randomized Signature »
Enea Monzio Compagnoni · Luca Biggio · Antonio Orvieto -
2021 Poster: Analytic Insights into Structure and Rank of Neural Network Hessian Maps »
Sidak Pal Singh · Gregor Bachmann · Thomas Hofmann -
2021 Poster: Precise characterization of the prior predictive distribution of deep ReLU networks »
Lorenzo Noci · Gregor Bachmann · Kevin Roth · Sebastian Nowozin · Thomas Hofmann -
2021 Poster: Rethinking the Variational Interpretation of Accelerated Optimization Methods »
Peiyuan Zhang · Antonio Orvieto · Hadi Daneshmand -
2021 Poster: On the Second-order Convergence Properties of Random Search Methods »
Aurelien Lucchi · Antonio Orvieto · Adamos Solomou -
2021 Poster: Disentangling the Roles of Curation, Data-Augmentation and the Prior in the Cold Posterior Effect »
Lorenzo Noci · Kevin Roth · Gregor Bachmann · Sebastian Nowozin · Thomas Hofmann -
2020 Poster: Batch normalization provably avoids ranks collapse for randomly initialised deep networks »
Hadi Daneshmand · Jonas Kohler · Francis Bach · Thomas Hofmann · Aurelien Lucchi -
2020 Poster: Adversarial Training is a Form of Data-dependent Operator Norm Regularization »
Kevin Roth · Yannic Kilcher · Thomas Hofmann -
2020 Spotlight: Adversarial Training is a Form of Data-dependent Operator Norm Regularization »
Kevin Roth · Yannic Kilcher · Thomas Hofmann -
2020 Poster: Convolutional Generation of Textured 3D Meshes »
Dario Pavllo · Graham Spinks · Thomas Hofmann · Marie-Francine Moens · Aurelien Lucchi -
2020 Oral: Convolutional Generation of Textured 3D Meshes »
Dario Pavllo · Graham Spinks · Thomas Hofmann · Marie-Francine Moens · Aurelien Lucchi -
2019 Poster: Shadowing Properties of Optimization Algorithms »
Antonio Orvieto · Aurelien Lucchi -
2019 Poster: Continuous-time Models for Stochastic Optimization Algorithms »
Antonio Orvieto · Aurelien Lucchi -
2019 Poster: A Domain Agnostic Measure for Monitoring and Evaluating GANs »
Paulina Grnarova · Kfir Y. Levy · Aurelien Lucchi · Nathanael Perraudin · Ian Goodfellow · Thomas Hofmann · Andreas Krause -
2018 Poster: Hyperbolic Neural Networks »
Octavian Ganea · Gary Becigneul · Thomas Hofmann -
2018 Spotlight: Hyperbolic Neural Networks »
Octavian Ganea · Gary Becigneul · Thomas Hofmann -
2018 Poster: Deep State Space Models for Unconditional Word Generation »
Florian Schmidt · Thomas Hofmann -
2017 Poster: Stabilizing Training of Generative Adversarial Networks through Regularization »
Kevin Roth · Aurelien Lucchi · Sebastian Nowozin · Thomas Hofmann -
2016 Poster: Adaptive Newton Method for Empirical Risk Minimization to Statistical Accuracy »
Aryan Mokhtari · Hadi Daneshmand · Aurelien Lucchi · Thomas Hofmann · Alejandro Ribeiro -
2015 Poster: Variance Reduced Stochastic Gradient Descent with Neighbors »
Thomas Hofmann · Aurelien Lucchi · Simon Lacoste-Julien · Brian McWilliams -
2014 Poster: Communication-Efficient Distributed Dual Coordinate Ascent »
Martin Jaggi · Virginia Smith · Martin Takac · Jonathan Terhorst · Sanjay Krishnan · Thomas Hofmann · Michael Jordan