Skip to yearly menu bar Skip to main content


Poster

Toward Semantic Gaze Target Detection

Samy Tafasca · Anshul Gupta · Victor Bros · Jean-marc Odobez

[ ]
Thu 12 Dec 4:30 p.m. PST — 7:30 p.m. PST

Abstract:

From the onset of infanthood, humans naturally develop the ability to closely observe and interpret the visual gaze of others. This skill, known as gaze following, holds significance in developmental theory as it enables us to grasp another person’s mental state, emotions, intentions, and more. In computer vision, gaze following is defined as the prediction of the pixel coordinates where a person in the image is focusing their attention. Existing methods in this research area have predominantly centered on pinpointing the gaze target by predicting a gaze heatmap or gaze point. However, a notable drawback of this approach is its limited practical value in gaze applications, as mere localization may not fully capture our primary interest — understanding the underlying semantics, such as the nature of the gaze target, rather than just its 2D pixel location. To address this gap, we introduce a novel architecture that simultaneously predicts the localization and semantic label of the gaze target. We devise a pseudo-annotation pipeline for an existing dataset, propose a new benchmark, develop an experimental protocol and design a suitable baseline for comparison. Our method sets a new state-of-the-art on the GazeFollow and GazeHOI datasets for localization and achieves competitive results in the recognition task.

Live content is unavailable. Log in and register to view live content