Skip to yearly menu bar Skip to main content


Oral
in
Workshop: Gaze Meets ML

Interaction-aware Dynamic 3D Gaze Estimation in Videos

Chenyi Kuang · Jeffrey O Kephart · Qiang Ji

[ ] [ Project Page ]
Sat 16 Dec 8:30 a.m. PST — 8:45 a.m. PST
 
presentation: Gaze Meets ML
Sat 16 Dec 6:15 a.m. PST — 3 p.m. PST

Abstract:

Human gaze in in-the-wild and outdoor human activities is a continuous and dynamic process that is driven by the anatomical eyemovements such as fixations, saccades and smooth pursuit. However, learning gaze dynamics in videos remains as a challenging task as annotating human gaze in videos is labor-expensive. In this paper, we propose a novel method for dynamic 3D gaze estimation in videos by utilizing the human interaction labels. Our model contains a temporal gaze estimator which is built upon Autoregressive Transformer structures. Besides, our model learns the spatial relationship of gaze among multiple subjects, by constructing a Human Interaction Graph from predicted gaze and update the gaze feature with a structure-aware Transformer. Our model predict future gaze conditioned on historical gaze and the gaze interactions in an autoregressive manner. We propose a multi-state training algorithm to alternatively update the Interaction module and dynamic gaze estimation module, when training on a mixture of labeled and unlabeled sequences. We show significant improvements in both within-domain gaze estimation accuracy and cross-domain generalization on the state-of-the-art physically unconstrainedin-the-wild Gaze360 gaze estimation benchmark.

Chat is not available.