Affinity Workshop: Women in Machine Learning

Attention-Augmented ST-GCN for Efficient Skeleton-based Human Action Recognition

Negar Heidari · Alexandros Iosifidis


Graph convolutional networks (GCNs) achieved promising performance in skeleton-based human action recognition by modeling a sequence of skeletons as a spatio-temporal graph. Each human body skeleton is modeled as a graph which encodes the natural physical structure of human body joints and their spatial connections, while the temporal dynamics of each action are represented by a sequence of temporally connected skeletons.Most of the recently proposed GCN-based deep neural networksprocess all the body skeletons in a sequence depicting the performed action.This is not efficient in terms of memory consumption and computation time. Considering that all the body skeletons in a temporal sequence are not equally important for recognizing the performed action, processing only a subset of the most informative body skeletons is a large step towards increasing the computational efficiency of both training andinference processes.Our goal is to increase computational efficiency while performing on par, or even better, compared to the state-of-the-art models utilizing all the body skeletons in a sequence for action recognition. In this regard, wepropose an attention-augmented ST-GCN method, called TA-GCN, for skeleton-based human action recognition. Our proposed method is capable of measuring the importance of each skeleton in a sequence using a trainable temporal attention module (TAM) placed early in the network architecture, and therefore increasing the computational efficiency in both training and testing phases by automatically selecting a subset of most informative skeletons to be processed for feature extraction and classification.

Chat is not available.