Skip to yearly menu bar Skip to main content


Keynote Talk
in
Workshop: Efficient Natural Language and Speech Processing (Models, Training, and Inference)

Compression and Acceleration of Pre-trained Language Models

Lu Hou


Abstract:

Recently, Transformer-based pre-trained models like BERT and GPT have achieved remarkable results on various natural language understanding tasks and even some computer vision and multi-modal tasks. However, these models have many parameters, hindering their deployment on edge devices or the cloud. In this talk, I will introduce some recent progress on how we alleviate the concerns in various deployment scenarios during the inference and training period. Specifically, compression and acceleration methods using knowledge distillation, dynamic networks, and network quantization will be discussed.