Timezone: »

 
Poster
Action is in the Eye of the Beholder: Eye-gaze Driven Model for Spatio-Temporal Action Localization
Nataliya Shapovalova · Michalis Raptis · Leonid Sigal · Greg Mori

Sun Dec 08 02:00 PM -- 06:00 PM (PST) @ Harrah's Special Events Center, 2nd Floor #None

We propose a new weakly-supervised structured learning approach for recognition and spatio-temporal localization of actions in video. As part of the proposed approach we develop a generalization of the Max-Path search algorithm, which allows us to efficiently search over a structured space of multiple spatio-temporal paths, while also allowing to incorporate context information into the model. Instead of using spatial annotations, in the form of bounding boxes, to guide the latent model during training, we utilize human gaze data in the form of a weak supervisory signal. This is achieved by incorporating gaze, along with the classification, into the structured loss within the latent SVM learning framework. Experiments on a challenging benchmark dataset, UCF-Sports, show that our model is more accurate, in terms of classification, and achieves state-of-the-art results in localization. In addition, we show how our model can produce top-down saliency maps conditioned on the classification label and localized latent paths.

Author Information

Nataliya Shapovalova (Simon Fraser University)
Michalis Raptis (Comcast Labs)
Leonid Sigal (University of British Columbia)
Greg Mori (Borealis AI)

More from the Same Authors