Timezone: »
Knowledge distillation (KD) is essentially a process of transferring a teacher model's behavior, e.g., network response, to a student model. The network response serves as additional supervision to formulate the machine domain, which uses the data collected from the human domain as a transfer set. Traditional KD methods hold an underlying assumption that the data collected in both human domain and machine domain are both independent and identically distributed (IID). We point out that this naive assumption is unrealistic and there is indeed a transfer gap between the two domains. Although the gap offers the student model external knowledge from the machine domain, the imbalanced teacher knowledge would make us incorrectly estimate how much to transfer from teacher to student per sample on the non-IID transfer set. To tackle this challenge, we propose Inverse Probability Weighting Distillation (IPWD) that estimates the propensity of a training sample belonging to the machine domain, and assigns its inverse amount to compensate for under-represented samples. Experiments on CIFAR-100 and ImageNet demonstrate the effectiveness of \ours~for both two-stage distillation and one-stage self-distillation.
Author Information
Yulei Niu (Columbia University)
Long Chen (Columbia University)
Chang Zhou (Zhejiang University)
Hanwang Zhang (NTU)
More from the Same Authors
-
2021 Spotlight: Self-Supervised Learning Disentangled Group Representation as Feature »
Tan Wang · Zhongqi Yue · Jianqiang Huang · Qianru Sun · Hanwang Zhang -
2021 Poster: Self-Supervised Learning Disentangled Group Representation as Feature »
Tan Wang · Zhongqi Yue · Jianqiang Huang · Qianru Sun · Hanwang Zhang -
2021 Poster: How Should Pre-Trained Language Models Be Fine-Tuned Towards Adversarial Robustness? »
Xinshuai Dong · Anh Tuan Luu · Min Lin · Shuicheng Yan · Hanwang Zhang -
2021 Poster: Introspective Distillation for Robust Question Answering »
Yulei Niu · Hanwang Zhang -
2020 Poster: Long-Tailed Classification by Keeping the Good and Removing the Bad Momentum Causal Effect »
Kaihua Tang · Jianqiang Huang · Hanwang Zhang -
2020 Poster: Causal Intervention for Weakly-Supervised Semantic Segmentation »
Dong Zhang · Hanwang Zhang · Jinhui Tang · Xian-Sheng Hua · Qianru Sun -
2020 Oral: Causal Intervention for Weakly-Supervised Semantic Segmentation »
Dong Zhang · Hanwang Zhang · Jinhui Tang · Xian-Sheng Hua · Qianru Sun -
2020 Poster: Interventional Few-Shot Learning »
Zhongqi Yue · Hanwang Zhang · Qianru Sun · Xian-Sheng Hua -
2018 Poster: Low-shot Learning via Covariance-Preserving Adversarial Augmentation Networks »
Hang Gao · Zheng Shou · Alireza Zareian · Hanwang Zhang · Shih-Fu Chang