Timezone: »

EPIC-KITCHENS VISOR Benchmark: VIdeo Segmentations and Object Relations
Ahmad Darkhalil · Dandan Shan · Bin Zhu · Jian Ma · Amlan Kar · Richard Higgins · Sanja Fidler · David Fouhey · Dima Damen

Thu Dec 01 02:00 PM -- 04:00 PM (PST) @ Hall J #1026

We introduce VISOR, a new dataset of pixel annotations and a benchmark suite for segmenting hands and active objects in egocentric video. VISOR annotates videos from EPIC-KITCHENS, which comes with a new set of challenges not encountered in current video segmentation datasets. Specifically, we need to ensure both short- and long-term consistency of pixel-level annotations as objects undergo transformative interactions, e.g. an onion is peeled, diced and cooked - where we aim to obtain accurate pixel-level annotations of the peel, onion pieces, chopping board, knife, pan, as well as the acting hands. VISOR introduces an annotation pipeline, AI-powered in parts, for scalability and quality. In total, we publicly release 272K manual semantic masks of 257 object classes, 9.9M interpolated dense masks, 67K hand-object relations, covering 36 hours of 179 untrimmed videos. Along with the annotations, we introduce three challenges in video object segmentation, interaction understanding and long-term reasoning.For data, code and leaderboards: http://epic-kitchens.github.io/VISOR

Author Information

Ahmad Darkhalil (University of Bristol)
Dandan Shan (University of Michigan)
Bin Zhu (University of Bristol)
Jian Ma (University of Bristol)
Amlan Kar (University of Toronto / Vector Institute / NVIDIA)
Richard Higgins (University of Michigan)
Sanja Fidler (TTI at Chicago)
David Fouhey (University of Michigan)
Dima Damen (University of Bristol)
Dima Damen

Professor of Computer Vision at the University of Bristol.

More from the Same Authors