Timezone: »

Neural Multisensory Scene Inference
Jae Hyun Lim · Pedro O. Pinheiro · Negar Rostamzadeh · Chris Pal · Sungjin Ahn

Wed Dec 10:45 AM -- 12:45 PM PST @ East Exhibition Hall B + C #116

For embodied agents to infer representations of the underlying 3D physical world they inhabit, they should efficiently combine multisensory cues from numerous trials, e.g., by looking at and touching objects. Despite its importance, multisensory 3D scene representation learning has received less attention compared to the unimodal setting. In this paper, we propose the Generative Multisensory Network (GMN) for learning latent representations of 3D scenes which are partially observable through multiple sensory modalities. We also introduce a novel method, called the Amortized Product-of-Experts, to improve the computational efficiency and the robustness to unseen combinations of modalities at test time. Experimental results demonstrate that the proposed model can efficiently infer robust modality-invariant 3D-scene representations from arbitrary combinations of modalities and perform accurate cross-modal generation. To perform this exploration we have also developed a novel multi-sensory simulation environment for embodied agents.

Author Information

Jae Hyun Lim (Mila, University of Montreal)
Pedro O. Pinheiro (Element AI)
Negar Rostamzadeh (Elemenet AI)
Chris Pal (MILA, Polytechnique Montréal, Element AI)
Sungjin Ahn (Rutgers University)

More from the Same Authors