Timezone: »
Poster
COHESIV: Contrastive Object and Hand Embedding Segmentation In Video
Dandan Shan · Richard Higgins · David Fouhey
In this paper we learn to segment hands and hand-held objects from motion. Our system takes a single RGB image and hand location as input to segment the hand and hand-held object. For learning, we generate responsibility maps that show how well a hand's motion explains other pixels' motion in video. We use these responsibility maps as pseudo-labels to train a weakly-supervised neural network using an attention-based similarity loss and contrastive loss. Our system outperforms alternate methods, achieving good performance on the 100DOH, EPIC-KITCHENS, and HO3D datasets.
Author Information
Dandan Shan (University of Michigan)
Richard Higgins (University of Michigan)
David Fouhey (University of Michigan)
More from the Same Authors
-
2023 Poster: Towards A Richer 2D Understanding of Hands at Scale »
Tianyi Cheng · Ayda Hassen · Dandan Shan · Richard Higgins · David Fouhey -
2022 Poster: EPIC-KITCHENS VISOR Benchmark: VIdeo Segmentations and Object Relations »
Ahmad Darkhalil · Dandan Shan · Bin Zhu · Jian Ma · Amlan Kar · Richard Higgins · Sanja Fidler · David Fouhey · Dima Damen