Real-time Multi-class Segmentation using Depth Cues
Abstract
We demonstrate a real-time multi-class segmentation system. While significant progress has been made in multi-class segmentation over the last few years, per-pixel label prediction for a given image typically takes on the order of minutes. This renders use of these systems impractical for real-time applications such as robotics, navigation and human- computer interaction. Concurrent with these advances, there has been a renewed interest in the use of depth sensors following the release of the Microsoft Kinect to aid various tasks in computer vision. This work demonstrates a real-time system that provides dense label predictions for a scene given both intensity and depth images. A convolutional network is trained from a newly released depth dataset* of aligned RGB and depth frames which have been annotated with dense pixel-wise labels. Once trained, the convolutional network can be efficiently computed on the neuflow processor**, reducing its computation from a few seconds in software to about 100ms.
- N. Silberman and R. Fergus. Indoor scene segmentation using a structured light sensor. In Pro- ceedings of the International Conference on Computer Vision - Workshop on 3D Representation and Recognition, 2011.
** Clément Farabet, Berin Martini, Polina Akselrod, Selcuk Talay, Yann LeCun, and Eugenio Cu- lurciello. Hardware accelerated convolutional neural networks for synthetic vision systems. In International Symposium on Circuits and Systems (ISCAS'10), Paris, May 2010. IEEE.