Timezone: »

 
Oral
Scene Representation Networks: Continuous 3D-Structure-Aware Neural Scene Representations
Vincent Sitzmann · Michael Zollhoefer · Gordon Wetzstein

Wed Dec 11 10:05 AM -- 10:20 AM (PST) @ West Exhibition Hall C + B3

Unsupervised learning with generative models has the potential of discovering rich representations of 3D scenes. While geometric deep learning has explored 3D-structure-aware representations of scene geometry, these models typically require explicit 3D supervision. Emerging neural scene representations can be trained only with posed 2D images, but existing methods ignore the three-dimensional structure of scenes. We propose Scene Representation Networks (SRNs), a continuous, 3D-structure-aware scene representation that encodes both geometry and appearance. SRNs represent scenes as continuous functions that map world coordinates to a feature representation of local scene properties. By formulating the image formation as a differentiable ray-marching algorithm, SRNs can be trained end-to-end from only 2D images and their camera poses, without access to depth or shape. This formulation naturally generalizes across scenes, learning powerful geometry and appearance priors in the process. We demonstrate the potential of SRNs by evaluating them for novel view synthesis, few-shot reconstruction, joint shape and appearance interpolation, and unsupervised discovery of a non-rigid face model.

Author Information

Vincent Sitzmann (Stanford University)

Vincent is an incoming Assistant Professor at MIT EECS, where he will lead the Scene Representation Group (scenerepresentations.org). Currently, he is a Postdoc at MIT's CSAIL with Josh Tenenbaum, Bill Freeman, and Fredo Durand. He finished his Ph.D. at Stanford University. His research interest lies in neural scene representations - the way neural networks learn to represent information on our world. His goal is to allow independent agents to reason about our world given visual observations, such as inferring a complete model of a scene with information on geometry, material, lighting etc. from only few observations, a task that is simple for humans, but currently impossible for AI.

Michael Zollhoefer (Facebook Reality Labs)
Gordon Wetzstein (Stanford University)

Related Events (a corresponding poster, oral, or spotlight)

More from the Same Authors