Skip to yearly menu bar Skip to main content


MaskRNN: Instance Level Video Object Segmentation

Yuan-Ting Hu · Jia-Bin Huang · Alex Schwing

Pacific Ballroom #84

Keywords: [ Computer Vision ] [ Video, Motion and Tracking ] [ Image Segmentation ]


Instance level video object segmentation is an important technique for video editing and compression. To capture the temporal coherence, in this paper, we develop MaskRNN, a recurrent neural net approach which fuses in each frame the output of two deep nets for each object instance - a binary segmentation net providing a mask and a localization net providing a bounding box. Due to the recurrent component and the localization component, our method is able to take advantage of long-term temporal structures of the video data as well as rejecting outliers. We validate the proposed algorithm on three challenging benchmark datasets, the DAVIS-2016 dataset, the DAVIS-2017 dataset, and the Segtrack v2 dataset, achieving state-of-the-art performance on all of them.

Live content is unavailable. Log in and register to view live content