Timezone: »
Recently, a new trend of exploring training sparsity has emerged, which remove parameters during training, leading to both training and inference efficiency improvement. This line of works primarily aims to obtain a single sparse model under a pre-defined large sparsity ratio. It leads to a static/fixed sparse inference model that is not capable of adjusting or re-configuring its computation complexity (i.e., inference structure, latency) after training for real-world varying and dynamic hardware resource availability. To enable such run-time or post-training network morphing, the concept of dynamic inference' or
training-once-for-all' has been proposed to train a single network consisting of multiple sub-nets once, but each sub-net could perform the same inference function with different computing complexity. However, the traditional dynamic inference training method requires a joint training scheme with multi-objective optimization, which suffers from very large training overhead. In this work, for the first time, we propose a novel alternating sparse training (AST) scheme to train multiple sparse sub-nets for dynamic inference without extra training cost compared to the case of training a single sparse model from scratch. Furthermore, to mitigate the interference of weight update among sub-nets, we propose gradient correction within the inner-group iterations to reduce their weight update interference. We validate the proposed AST on multiple datasets against state-of-the-art sparse training method, which shows that AST achieves similar or better accuracy, but only needs to train once to get multiple sparse sub-nets with different sparsity ratios. More importantly, compared with the traditional joint training based dynamic inference training methodology, the large training overhead is completely eliminated without affecting the accuracy of each sub-net.
Author Information
Li Yang (Arizona State University)
Jian Meng (Arizona State University)
Jae-sun Seo (Arizona State University)
Deliang Fan (Arizona State University)
More from the Same Authors
-
2022 : CL-LSG: Continual Learning via Learnable Sparse Growth »
Li Yang · Sen Lin · Junshan Zhang · Deliang Fan -
2023 Poster: Slimmed Asymmetrical Contrastive Learning and Cross Distillation for Lightweight Model Training »
Jian Meng · Li Yang · Kyungmin Lee · Jinwoo Shin · Deliang Fan · Jae-sun Seo -
2022 Spotlight: Lightning Talks 3B-4 »
Guanghu Yuan · Yijing Liu · Li Yang · Yongri Piao · Zekang Zhang · Yaxin Xiao · Lin Chen · Hong Chang · Fajie Yuan · Guangyu Gao · Hong Chang · Qinxian Liu · Zhixiang Wei · Qingqing Ye · Chenyang Lu · Jian Meng · Haibo Hu · Xin Jin · Yudong Li · Miao Zhang · Zhiyuan Fang · Jae-sun Seo · Bingpeng MA · Jian-Wei Zhang · Shiguang Shan · Haozhe Feng · Huaian Chen · Deliang Fan · Huadi Zheng · Jianbo Jiao · Huchuan Lu · Beibei Kong · Miao Zheng · Chengfang Fang · Shujie Li · Zhongwei Wang · Yunchao Wei · Xilin Chen · Jie Shi · Kai Chen · Zihan Zhou · Lei Chen · Yi Jin · Wei Chen · Min Yang · Chenyun YU · Bo Hu · Zang Li · Yu Xu · Xiaohu Qie -
2022 Spotlight: Get More at Once: Alternating Sparse Training with Gradient Correction »
Li Yang · Jian Meng · Jae-sun Seo · Deliang Fan -
2022 Poster: Beyond Not-Forgetting: Continual Learning with Backward Knowledge Transfer »
Sen Lin · Li Yang · Deliang Fan · Junshan Zhang