Timezone: »

A Hierarchical Image Model for Polynomial-Time 2D Parsing
Long Zhu · Yuanhao Chen · Yuan Lin · Alan Yuille

Wed Dec 10 11:51 AM -- 11:52 AM (PST) @

Language and image understanding are two major goals of artificial intelligence which can both be conceptually formulated in terms of parsing the input signal into a hierarchical representation. Natural language researchers have made great progress by exploiting the 1D structure of language to design efficient polynomial-time parsing algorithms. By contrast, the two-dimensional nature of images makes it much harder to design efficient image parsers and the form of the hierarchical representations is also unclear. Attempts to adapt representations and algorithms from natural language have only been partially successful. In this paper, we propose a Hierarchical Image Model (HIM) for 2D image parsing which outputs image segmentation and object recognition. This HIM has multiple layers (five in this paper) and has advantages for representation, inference, and learning. Firstly, the HIM has a coarse-to-fine representation which is capable of capturing long-range dependency and exploiting different levels of contextual information. Secondly, the structure of the HIM allows us to design a rapid inference algorithm, based on dynamic programming, which enables us to parse the image rapidly in polynomial time. Thirdly, we can learn the HIM efficiently in a discriminative manner from a labeled dataset. We demonstrate that HIM outperforms other state-of-the-art methods by evaluation on the challenging public MSRC image dataset. Finally, we sketch how the HIM architecture can be extended to model more complex image phenomena.

Author Information

Long Zhu (Massachusetts Institute of Technology)
Yuanhao Chen (University of California, Los Angeles)
Yuan Lin (SJTU)
Alan Yuille (JHU)

Related Events (a corresponding poster, oral, or spotlight)

More from the Same Authors