Timezone: »

Residual Distillation: Towards Portable Deep Neural Networks without Shortcuts
Guilin Li · Junlei Zhang · Yunhe Wang · Chuanjian Liu · Matthias Tan · Yunfeng Lin · Wei Zhang · Jiashi Feng · Tong Zhang

Thu Dec 10 09:00 AM -- 11:00 AM (PST) @ Poster Session 5 #1644
By transferring both features and gradients between different layers, shortcut connections explored by ResNets allow us to effectively train very deep neural networks up to hundreds of layers. However, the additional computation costs induced by those shortcuts are often overlooked. For example, during online inference, the shortcuts in ResNet-50 account for about 40 percent of the entire memory usage on feature maps, because the features in the preceding layers cannot be released until the subsequent calculation is completed. In this work, for the first time, we consider training the CNN models with shortcuts and deploying them without. In particular, we propose a novel joint-training framework to train plain CNN by leveraging the gradients of the ResNet counterpart. During forward step, the feature maps of the early stages of plain CNN are passed through later stages of both itself and the ResNet counterpart to calculate the loss. During backpropagation, gradients calculated from a mixture of these two parts are used to update the plainCNN network to solve the gradient vanishing problem. Extensive experiments on ImageNet/CIFAR10/CIFAR100 demonstrate that the plainCNN network without shortcuts generated by our approach can achieve the same level of accuracy as that of the ResNet baseline while achieving about $1.4\times $ speed-up and $1.25\times$ memory reduction. We also verified the feature transferability of our ImageNet pretrained plain-CNN network by fine-tuning it on MIT 67 and Caltech 101. Our results show that the performance of the plain-CNN is slightly higher than that of its baseline ResNet-50 on these two datasets. The code will be available at \href{https://github.com/leoozy/JointRD_Neurips2020}{https://github.com/leoozy/JointRD\_Neurips2020} and the MindSpore code will be available at \href{https://www.mindspore.cn/resources/hub}{https://www.mindspore.cn/resources/hub}.

Author Information

Guilin Li (Huawei Noah Ark's Lab)
Junlei Zhang (Huawei Noah’s Ark Lab)
Yunhe Wang (Huawei Noah's Ark Lab)
Chuanjian Liu (Huawei Noah's Ark Lab)
Matthias Tan (CityU)
Yunfeng Lin (Shanghai Jiao Tong University)
Wei Zhang (Noah's Ark Lab, Huawei Inc.)
Jiashi Feng (National University of Singapore)
Tong Zhang (The Hong Kong University of Science and Technology)

More from the Same Authors