Timezone: »
Knowledge distillation (KD) is an effective framework to transfer knowledge from a large-scale teacher to a compact yet well-performing student. Previous KD practices for pre-trained language models transfer knowledge by aligning instance-wise outputs between the teacher and the student, while neglecting an important knowledge source, i.e., the gradient of the teacher. The gradient characterizes how the teacher responds to changes in inputs, which we assume is beneficial for the student to better approximate the underlying mapping function of the teacher. Therefore, we propose Gradient Knowledge Distillation (GKD) to incorporate the gradient alignment objective into the distillation process.Experimental results show that GKD outperforms previous KD methods in the student's performance. Further analysis shows that incorporating gradient knowledge makes the student behave more consistently with the teacher, improving the interpretability greatly.
Author Information
Lean Wang (Peking University)
Lei Li (Peking University)
Xu Sun (Peking University)
Related Events (a corresponding poster, oral, or spotlight)
-
2022 : Gradient Knowledge Distillation for Pre-trained Language Models »
Dates n/a. Room
More from the Same Authors
-
2022 Poster: Retrieve, Reason, and Refine: Generating Accurate and Faithful Patient Instructions »
Fenglin Liu · Bang Yang · Chenyu You · Xian Wu · Shen Ge · Zhangdaihong Liu · Xu Sun · Yang Yang · David Clifton -
2021 : Continual Learning in Large-Scale Pre-Training »
Xu Sun -
2021 Poster: Auto-Encoding Knowledge Graph for Unsupervised Medical Report Generation »
Fenglin Liu · Chenyu You · Xian Wu · Shen Ge · Sheng wang · Xu Sun -
2021 Poster: Topology-Imbalance Learning for Semi-Supervised Node Classification »
Deli Chen · Yankai Lin · Guangxiang Zhao · Xuancheng Ren · Peng Li · Jie Zhou · Xu Sun -
2020 Poster: Prophet Attention: Predicting Attention with Future Attention »
Fenglin Liu · Xuancheng Ren · Xian Wu · Shen Ge · Wei Fan · Yuexian Zou · Xu Sun -
2019 Poster: Understanding and Improving Layer Normalization »
Jingjing Xu · Xu Sun · Zhiyuan Zhang · Guangxiang Zhao · Junyang Lin -
2019 Poster: Aligning Visual Regions and Textual Concepts for Semantic-Grounded Image Representations »
Fenglin Liu · Yuanxin Liu · Xuancheng Ren · Xiaodong He · Xu Sun -
2014 Poster: Structure Regularization for Structured Prediction »
Xu Sun