Skip to yearly menu bar Skip to main content

Keynote Talk
Workshop: Efficient Natural Language and Speech Processing (Models, Training, and Inference)

Continual Learning in Large-Scale Pre-Training

Xu Sun


Large-scale pre-training has enabled break-throughs in natural language processing. However, the underlying large-scale models and data make the studies in the field hard to sustain. In this talk, I will introduce our recent work focusing on continual learning in large-scale pre-training to improve the efficiency of pre-trained language models (from ICML 2021, AAAI 2021, etc.). For data-efficient continual learning for PLMs, this talk includes our work on addressing long-tailed data distribution with definitional data and accurate behavioral modifications with low instance-wise side effects by limiting the changed parameters. For cost-effective searching of PLM architecture, I will introduce our training-free neural architecture search method based on the gram matrix of instance gradients that can find better fine-tuning architecture of PLMs. Continual Learning has vast opportunities in efficient PLMs learning and applications and new challenges are there to be resolved.