Timezone: »
Distilling knowledge from an ensemble of teacher models is expected to have a more promising performance than that from a single one. Current methods mainly adopt a vanilla average rule, i.e., to simply take the average of all teacher losses for training the student network. However, this approach treats teachers equally and ignores the diversity among them. When conflicts or competitions exist among teachers, which is common, the inner compromise might hurt the distillation performance. In this paper, we examine the diversity of teacher models in the gradient space and regard the ensemble knowledge distillation as a multi-objective optimization problem so that we can determine a better optimization direction for the training of student network. Besides, we also introduce a tolerance parameter to accommodate disagreement among teachers. In this way, our method can be seen as a dynamic weighting method for each teacher in the ensemble. Extensive experiments validate the effectiveness of our method for both logits-based and feature-based cases.
Author Information
Shangchen Du (SenseTime)
Shan You (SenseTime)
Xiaojie Li (sensetime)
Jianlong Wu (Shandong University)
Fei Wang (SenseTime)
Chen Qian (SenseTime)
Changshui Zhang (Tsinghua University)
More from the Same Authors
-
2022 Poster: Weak-shot Semantic Segmentation via Dual Similarity Transfer »
Junjie Chen · Li Niu · Siyuan Zhou · Jianlou Si · Chen Qian · Liqing Zhang -
2022 Spotlight: Lightning Talks 6B-4 »
Junjie Chen · Chuanxia Zheng · JINLONG LI · Yu Shi · Shichao Kan · Yu Wang · FermÃn Travi · Ninh Pham · Lei Chai · Guobing Gan · Tung-Long Vuong · Gonzalo Ruarte · Tao Liu · Li Niu · Jingjing Zou · Zequn Jie · Peng Zhang · Ming LI · Yixiong Liang · Guolin Ke · Jianfei Cai · Gaston Bujia · Sunzhu Li · Siyuan Zhou · Jingyang Lin · Xu Wang · Min Li · Zhuoming Chen · Qing Ling · Xiaolin Wei · Xiuqing Lu · Shuxin Zheng · Dinh Phung · Yigang Cen · Jianlou Si · Juan Esteban Kamienkowski · Jianxin Wang · Chen Qian · Lin Ma · Benyou Wang · Yingwei Pan · Tie-Yan Liu · Liqing Zhang · Zhihai He · Ting Yao · Tao Mei -
2022 Spotlight: Weak-shot Semantic Segmentation via Dual Similarity Transfer »
Junjie Chen · Li Niu · Siyuan Zhou · Jianlou Si · Chen Qian · Liqing Zhang -
2022 Poster: Knowledge Distillation from A Stronger Teacher »
Tao Huang · Shan You · Fei Wang · Chen Qian · Chang Xu -
2022 Poster: Green Hierarchical Vision Transformer for Masked Image Modeling »
Lang Huang · Shan You · Mingkai Zheng · Fei Wang · Chen Qian · Toshihiko Yamasaki -
2022 Poster: Searching for Better Spatio-temporal Alignment in Few-Shot Action Recognition »
Yichao Cao · Xiu Su · Qingfei Tang · Shan You · Xiaobo Lu · Chang Xu -
2022 Poster: Synergy-of-Experts: Collaborate to Improve Adversarial Robustness »
Sen Cui · Jingfeng ZHANG · Jian Liang · Bo Han · Masashi Sugiyama · Changshui Zhang -
2021 Poster: Addressing Algorithmic Disparity and Performance Inconsistency in Federated Learning »
Sen Cui · Weishen Pan · Jian Liang · Changshui Zhang · Fei Wang -
2021 Poster: ReSSL: Relational Self-Supervised Learning with Weak Augmentation »
Mingkai Zheng · Shan You · Fei Wang · Chen Qian · Changshui Zhang · Xiaogang Wang · Chang Xu -
2020 Poster: AOT: Appearance Optimal Transport Based Identity Swapping for Forgery Detection »
Hao Zhu · Chaoyou Fu · Qianyi Wu · Wayne Wu · Chen Qian · Ran He -
2020 Poster: ISTA-NAS: Efficient and Consistent Neural Architecture Search by Sparse Coding »
Yibo Yang · Hongyang Li · Shan You · Fei Wang · Chen Qian · Zhouchen Lin -
2020 Poster: When Counterpoint Meets Chinese Folk Melodies »
Nan Jiang · Sheng Jin · Zhiyao Duan · Changshui Zhang -
2019 Poster: Subspace Attack: Exploiting Promising Subspaces for Query-Efficient Black-box Attacks »
Yiwen Guo · Ziang Yan · Changshui Zhang -
2018 Poster: Sparse DNNs with Improved Adversarial Robustness »
Yiwen Guo · Chao Zhang · Changshui Zhang · Yurong Chen -
2018 Poster: Connectionist Temporal Classification with Maximum Entropy Regularization »
Hu Liu · Sheng Jin · Changshui Zhang -
2018 Spotlight: Connectionist Temporal Classification with Maximum Entropy Regularization »
Hu Liu · Sheng Jin · Changshui Zhang -
2018 Poster: Deep Defense: Training DNNs with Improved Adversarial Robustness »
Ziang Yan · Yiwen Guo · Changshui Zhang -
2012 Poster: Multi-Stage Multi-Task Feature Learning »
Pinghua Gong · Jieping Ye · Changshui Zhang -
2012 Spotlight: Multi-Stage Multi-Task Feature Learning »
Pinghua Gong · Jieping Ye · Changshui Zhang -
2010 Poster: Learning Kernels with Radiuses of Minimum Enclosing Balls »
Kun Gai · Guangyun Chen · Changshui Zhang