Timezone: »

Joint-task Self-supervised Learning for Temporal Correspondence
Xueting Li · Sifei Liu · Shalini De Mello · Xiaolong Wang · Jan Kautz · Ming-Hsuan Yang

Wed Dec 11 10:45 AM -- 12:45 PM (PST) @ East Exhibition Hall B + C #65

This paper proposes to learn reliable dense correspondence from videos in a self-supervised manner. Our learning process integrates two highly related tasks: tracking large image regions and establishing fine-grained pixel-level associations between consecutive video frames. We exploit the synergy between both tasks through a shared inter-frame affinity matrix, which simultaneously models transitions between video frames at both the region- and pixel-levels. While region-level localization helps reduce ambiguities in fine-grained matching by narrowing down search regions; fine-grained matching provides bottom-up features to facilitate region-level localization. Our method outperforms the state-of-the-art self-supervised methods on a variety of visual correspondence tasks, including video-object and part-segmentation propagation, keypoint tracking, and object tracking. Our self-supervised method even surpasses the fully-supervised affinity feature representation obtained from a ResNet-18 pre-trained on the ImageNet.

Author Information

Xueting Li (University of California, Merced)
Sifei Liu (NVIDIA)
Shalini De Mello (NVIDIA)
Shalini De Mello

Shalini De Mello is a Principal Research Scientist and Research Lead in the Learning and Perception Research group at NVIDIA, which she joined in 2013. Her research interests are in human-centric vision (face and gaze analysis) and in data-efficient (synth2real, low-shot, self-supervised and multimodal) machine learning. She has co-authored 48 peer-reviewed publications and holds 38 patents. Her inventions have contributed to several NVIDIA products, including DriveIX and Maxine. Previously, she has worked at Texas Instruments and AT&T Laboratories. She received her Doctoral degree in Electrical and Computer Engineering from the University of Texas at Austin.

Xiaolong Wang (CMU)
Jan Kautz (NVIDIA)
Ming-Hsuan Yang (Google / UC Merced)

More from the Same Authors