Affinity Workshop: LatinX in AI

Impact of Pose Estimation Models for Landmark-based Sign Language Recognition

Cristian Lazo Quispe · Joe Huamani Malca · Gissella Bejarano Nicho · Manuel Huaman Ramos · Pablo Rivas · Tomas Cerny


Sign Language Recognition (SLR) models rely heavily on advances reached by the Human Action Recognition (HAR). One of the simplest and most dimensional-reduced modality is the skeleton joints and limbs represented with key-point landmarks and edges connecting these landmarks. These skeletons can be obtained by pose estimation, depth maps or motion capture. For HAR, models are usually interested in less granularity of pose estimation, compared to SLR, where it is highly important the landmark estimation of not only the pose and body but the facial gestures, hands and fingers. In this work, we compare three whole-body estimation libraries/models that are gaining attraction in the SLR task. We first find their relation by identifying common keypoints in their landmark structure and analyzing their quality. Then, we complement this analysis by comparing their annotations in three sign language datasets with videos of different quality, background, and region (Peru and USA). We test a sign language recognition model to compare the quality of the annotations provided by these libraries/models.

Chat is not available.