Skip to yearly menu bar Skip to main content


Spotlight
in
Workshop: Physical Reasoning and Inductive Biases for the Real World

3D-OES: Viewpoint-Invariant Object-FactorizedEnvironment Simulators

Hsiao-Yu Tung · Zhou Xian · Mihir Prabhudesai · Katerina Fragkiadaki


Abstract:

We propose an action-conditioned dynamics model that predicts scene changes caused by object and agent interactions in a viewpoint-invariant 3D neural scene representation space, inferred from RGB-D videos. In this 3D feature space, objects do not interfere with one another and their appearance persists over time and across viewpoints. This permits our model to predict future scenes long in the future by simply “moving" 3D object features based on cumulative object motion predictions. Object motion predictions are computed by a graph neural network that operates over the object features extracted from the 3D neural scene representation. Our model generalizes well across varying number and appearances of interacting objects as well as across camera viewpoints, outperforming existing 2D and 3D dynamics models, and enables successful sim-to-real transfer.