Affinity Workshop: LatinX in AI

Boosting Self-supervised Video-based Human Action Recognition Through Knowledge Distillation

Fernando Camarena · MIGUEL GONZALEZ-MENDOZA · Leonardo Chang · Neil Hernandez-Gress


Deep learning architectures lead the state-of-the-art in several computer vision, natural language processing, and reinforcement learning tasks due to their ability to extract multi-level representations without human engineering. The model's performance is affected by the amount of labeled data used in training. Hence, novel approaches like self-supervised learning (SSL) extract the supervisory signal using unlabeled data.Although SSL reduces the dependency on human annotations, there are still two main drawbacks. First, high-computational resources are required to train a large-scale model from scratch. Second, knowledge from an SSL model is commonly finetuning to a target model, which forces them to share the same parameters and architecture and make it task-dependent.This paper explores how SSL benefits from knowledge distillation in constructing an efficient and non-task-dependent training framework. The experimental design compared the training process of an SSL algorithm trained from scratch and boosted by knowledge distillation in a teacher-student paradigm using the video-based human action recognition dataset UCF101. Results show that knowledge distillation accelerates the convergence of a network and removes the reliance on model architectures.

Chat is not available.