Timezone: »
In this paper, we empirically study how to make the most of low-resolution frames for efficient video recognition. Existing methods mainly focus on developing compact networks or alleviating temporal redundancy of video inputs to increase efficiency, whereas compressing frame resolution has rarely been considered a promising solution. A major concern is the poor recognition accuracy on low-resolution frames. We thus start by analyzing the underlying causes of performance degradation on low-resolution frames. Our key finding is that the major cause of degradation is not information loss in the down-sampling process, but rather the mismatch between network architecture and input scale. Motivated by the success of knowledge distillation (KD), we propose to bridge the gap between network and input size via cross-resolution KD (ResKD). Our work shows that ResKD is a simple but effective method to boost recognition accuracy on low-resolution frames. Without bells and whistles, ResKD considerably surpasses all competitive methods in terms of efficiency and accuracy on four large-scale benchmark datasets, i.e., ActivityNet, FCVID, Mini-Kinetics, Something-Something V2. In addition, we extensively demonstrate its effectiveness over state-of-the-art architectures, i.e., 3D-CNNs and Video Transformers, and scalability towards super low-resolution frames. The results suggest ResKD can serve as a general inference acceleration method for state-of-the-art video recognition. Our code will be available at https://github.com/CVMI-Lab/ResKD.
Author Information
Chuofan Ma (The University of Hong Kong)
Qiushan Guo (The University of Hong Kong)
Yi Jiang (ByteDance)
Ping Luo (The University of Hong Kong)
Zehuan Yuan (Nanjing University)
Xiaojuan Qi (The University of Hong Kong)
More from the Same Authors
-
2021 : An Empirical Investigation of Representation Learning for Imitation »
Cynthia Chen · Sam Toyer · Cody Wild · Scott Emmons · Ian Fischer · Kuang-Huei Lee · Neel Alex · Steven Wang · Ping Luo · Stuart Russell · Pieter Abbeel · Rohin Shah -
2022 Poster: Embracing Consistency: A One-Stage Approach for Spatio-Temporal Video Grounding »
Yang Jin · yongzhi li · Zehuan Yuan · Yadong Mu -
2022 Poster: Unifying Voxel-based Representation with Transformer for 3D Object Detection »
Yanwei Li · Yilun Chen · Xiaojuan Qi · Zeming Li · Jian Sun · Jiaya Jia -
2022 Poster: QueryPose: Sparse Multi-Person Pose Regression via Spatial-Aware Part-Level Query »
Yabo Xiao · Kai Su · Xiaojuan Wang · Dongdong Yu · Lei Jin · Mingshu He · Zehuan Yuan -
2022 Poster: Towards Efficient 3D Object Detection with Knowledge Distillation »
Jihan Yang · Shaoshuai Shi · Runyu Ding · Zhe Wang · Xiaojuan Qi -
2022 : SEM2: Enhance Sample Efficiency and Robustness of End-to-end Urban Autonomous Driving via Semantic Masked World Model »
Zeyu Gao · Yao Mu · Ruoyan Shen · Chen Chen · Yangang Ren · Jianyu Chen · Shengbo Li · Ping Luo · Yanfeng Lu -
2023 Poster: Data Pruning via Moving-one-Sample-out »
Haoru Tan · Sitong Wu · Fei Du · Yukang Chen · Zhibin Wang · Fan Wang · Xiaojuan Qi -
2023 Poster: CL-NeRF: Continual Learning of Neural Radiance Fields for Evolving Scene Representation »
Xiuzhe Wu · Peng Dai · Weipeng DENG · Handi Chen · Yang Wu · Yan-Pei Cao · Ying Shan · Xiaojuan Qi -
2023 Poster: RAPHAEL: Text-to-Image Generation via Large Mixture of Diffusion Paths »
Zeyue Xue · Guanglu Song · Qiushan Guo · Boxiao Liu · Zhuofan Zong · Yu Liu · Ping Luo -
2023 Poster: Flow-Based Feature Fusion for Vehicle-Infrastructure Cooperative 3D Object Detection »
Haibao Yu · Yingjuan Tang · Enze Xie · Jilei Mao · Ping Luo · Zaiqing Nie -
2023 Poster: VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks »
Wenhai Wang · Zhe Chen · Xiaokang Chen · Jiannan Wu · Xizhou Zhu · Gang Zeng · Ping Luo · Tong Lu · Jie Zhou · Yu Qiao · Jifeng Dai -
2023 Poster: CoDet: Co-occurrence Guided Region-Word Alignment for Open-Vocabulary Object Detection »
Chuofan Ma · Yi Jiang · Xin Wen · Zehuan Yuan · Xiaojuan Qi -
2023 Poster: EmbodiedGPT: Vision-Language Pre-Training via Embodied Chain of Thought »
Yao Mu · Qinglong Zhang · Mengkang Hu · Wenhai Wang · Mingyu Ding · Jun Jin · Bin Wang · Jifeng Dai · Yu Qiao · Ping Luo -
2023 Poster: Foundation Model is Efficient Multimodal Multitask Model Selector »
Fanqing Meng · Wenqi Shao · zhanglin peng · Chonghe Jiang · Kaipeng Zhang · Yu Qiao · Ping Luo -
2023 Poster: OpenLane-V2: A Topology Reasoning Benchmark for Scene Understanding in Autonomous Driving »
Huijie Wang · Tianyu Li · Yang Li · Li Chen · Chonghao Sima · Zhenbo Liu · Bangjun Wang · Peijin Jia · Yuting Wang · Shengyin Jiang · Feng Wen · Hang Xu · Ping Luo · Junchi Yan · Wei Zhang · Hongyang Li -
2022 Workshop: Vision Transformers: Theory and applications »
Fahad Shahbaz Khan · Gul Varol · Salman Khan · Ping Luo · Rao Anwer · Ashish Vaswani · Hisham Cholakkal · Niki Parmar · Joost van de Weijer · Mubarak Shah -
2022 Spotlight: Lightning Talks 6B-3 »
Lingfeng Yang · Yao Lai · Zizheng Pan · Zhenyu Wang · Weicong Liang · Chuanyang Zheng · Jian-Wei Zhang · Peng Jin · Jing Liu · Xiuying Wei · Yao Mu · Xiang Li · YUHUI YUAN · Zizheng Pan · Yifan Sun · Yunchen Zhang · Jianfei Cai · Hao Luo · zheyang li · Jinfa Huang · Haoyu He · Yi Yang · Ping Luo · Fenglin Liu · Henghui Ding · Borui Zhao · Xiangguo Zhang · Kai Zhang · Pichao WANG · Bohan Zhuang · Wei Chen · Ruihao Gong · Zhi Yang · Xian Wu · Feng Ding · Jianfei Cai · Xiao Luo · Renjie Song · Weihong Lin · Jian Yang · Wenming Tan · Bohan Zhuang · Shanghang Zhang · Shen Ge · Fan Wang · Qi Zhang · Guoli Song · Jun Xiao · Hao Li · Ding Jia · David Clifton · Ye Ren · Fengwei Yu · Zheng Zhang · Jie Chen · Shiliang Pu · Xianglong Liu · Chao Zhang · Han Hu -
2022 Spotlight: MaskPlace: Fast Chip Placement via Reinforced Visual Representation Learning »
Yao Lai · Yao Mu · Ping Luo -
2022 Spotlight: DOMINO: Decomposed Mutual Information Optimization for Generalized Context in Meta-Reinforcement Learning »
Yao Mu · Yuzheng Zhuang · Fei Ni · Bin Wang · Jianyu Chen · Jianye Hao · Ping Luo -
2022 Spotlight: Lightning Talks 5A-1 »
Yao Mu · Jin Zhang · Haoyi Niu · Rui Yang · Mingdong Wu · Ze Gong · Shubham Sharma · Chenjia Bai · Yu ("Tony") Zhang · Siyuan Li · Yuzheng Zhuang · Fangwei Zhong · Yiwen Qiu · Xiaoteng Ma · Fei Ni · Yulong Xia · Chongjie Zhang · Hao Dong · Ming Li · Zhaoran Wang · Bin Wang · Chongjie Zhang · Jianyu Chen · Guyue Zhou · Lei Han · Jianming HU · Jianye Hao · Xianyuan Zhan · Ping Luo -
2022 Spotlight: Lightning Talks 3A-4 »
Jinzhi Zhang · Hao Jiang · Hongrui Cai · Qi Yi · Yang Jin · Zhi Tian · Rui Zhang · Wanquan Feng · Xiangxiang Chu · Ruofan Tang · yongzhi li · Yadong Mu · Zehuan Yuan · shaohui peng · Zheng Cao · Xiaoming Wang · Xuetao Feng · Xiaolin Wei · Jiaming Guo · Yadong Mu · Yan Wang · Jing Xiao · Xing Hu · Chunhua Shen · Ruqi Huang · Juyong Zhang · Zidong Du · LU FANG · xishan zhang · Qi Guo · Yunji Chen -
2022 Spotlight: Embracing Consistency: A One-Stage Approach for Spatio-Temporal Video Grounding »
Yang Jin · yongzhi li · Zehuan Yuan · Yadong Mu -
2022 Poster: Spatial Pruned Sparse Convolution for Efficient 3D Object Detection »
Jianhui Liu · Yukang Chen · Xiaoqing Ye · Zhuotao Tian · Xiao Tan · Xiaojuan Qi -
2022 Poster: Prototypical VoteNet for Few-Shot 3D Point Cloud Object Detection »
Shizhen Zhao · Xiaojuan Qi -
2022 Poster: DOMINO: Decomposed Mutual Information Optimization for Generalized Context in Meta-Reinforcement Learning »
Yao Mu · Yuzheng Zhuang · Fei Ni · Bin Wang · Jianyu Chen · Jianye Hao · Ping Luo -
2022 Poster: AdaptFormer: Adapting Vision Transformers for Scalable Visual Recognition »
Shoufa Chen · Chongjian GE · Zhan Tong · Jiangliu Wang · Yibing Song · Jue Wang · Ping Luo -
2022 Poster: MaskPlace: Fast Chip Placement via Reinforced Visual Representation Learning »
Yao Lai · Yao Mu · Ping Luo -
2022 Poster: Self-Supervised Visual Representation Learning with Semantic Grouping »
Xin Wen · Bingchen Zhao · Anlin Zheng · Xiangyu Zhang · Xiaojuan Qi -
2022 Poster: AMOS: A Large-Scale Abdominal Multi-Organ Benchmark for Versatile Medical Image Segmentation »
Yuanfeng Ji · Haotian Bai · Chongjian GE · Jie Yang · Ye Zhu · Ruimao Zhang · Zhen Li · Lingyan Zhanng · Wanling Ma · Xiang Wan · Ping Luo -
2022 Poster: Large-batch Optimization for Dense Visual Predictions: Training Faster R-CNN in 4.2 Minutes »
Zeyue Xue · Jianming Liang · Guanglu Song · Zhuofan Zong · Liang Chen · Yu Liu · Ping Luo -
2021 Poster: Rethinking the Pruning Criteria for Convolutional Neural Network »
Zhongzhan Huang · Wenqi Shao · Xinjiang Wang · Liang Lin · Ping Luo -
2021 Poster: Dynamic Visual Reasoning by Learning Differentiable Physics Models from Video and Language »
Mingyu Ding · Zhenfang Chen · Tao Du · Ping Luo · Josh Tenenbaum · Chuang Gan -
2021 Poster: Model-Based Reinforcement Learning via Imagination with Derived Memory »
Yao Mu · Yuzheng Zhuang · Bin Wang · Guangxiang Zhu · Wulong Liu · Jianyu Chen · Ping Luo · Shengbo Li · Chongjie Zhang · Jianye Hao -
2021 Poster: Disentangled Contrastive Learning on Graphs »
Haoyang Li · Xin Wang · Ziwei Zhang · Zehuan Yuan · Hang Li · Wenwu Zhu -
2021 Poster: Revitalizing CNN Attention via Transformers in Self-Supervised Visual Representation Learning »
Chongjian GE · Youwei Liang · YIBING SONG · Jianbo Jiao · Jue Wang · Ping Luo -
2021 Poster: Compressed Video Contrastive Learning »
Yuqi Huo · Mingyu Ding · Haoyu Lu · Nanyi Fei · Zhiwu Lu · Ji-Rong Wen · Ping Luo -
2021 Poster: SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers »
Enze Xie · Wenhai Wang · Zhiding Yu · Anima Anandkumar · Jose M. Alvarez · Ping Luo -
2020 Poster: Lightweight Generative Adversarial Networks for Text-Guided Image Manipulation »
Bowen Li · Xiaojuan Qi · Philip Torr · Thomas Lukasiewicz -
2014 Poster: Multi-View Perceptron: a Deep Model for Learning Face Identity and View Representations »
Zhenyao Zhu · Ping Luo · Xiaogang Wang · Xiaoou Tang