Timezone: »
The visual world can be parsimoniously characterized in terms of distinct entities with sparse interactions. Discovering this compositional structure in dynamic visual scenes has proven challenging for end-to-end computer vision approaches unless explicit instance-level supervision is provided. Slot-based models leveraging motion cues have recently shown great promise in learning to represent, segment, and track objects without direct supervision, but they still fail to scale to complex real-world multi-object videos. In an effort to bridge this gap, we take inspiration from human development and hypothesize that information about scene geometry in the form of depth signals can facilitate object-centric learning. We introduce SAVi++, an object-centric video model which is trained to predict depth signals from a slot-based video representation. By further leveraging best practices for model scaling, we are able to train SAVi++ to segment complex dynamic scenes recorded with moving cameras, containing both static and moving objects of diverse appearance on naturalistic backgrounds, without the need for segmentation supervision. Finally, we demonstrate that by using sparse depth signals obtained from LiDAR, SAVi++ is able to learn emergent object segmentation and tracking from videos in the real-world Waymo Open dataset.
Author Information
Gamaleldin Elsayed (Google Research, Brain Team)
Aravindh Mahendran (Google)
Sjoerd van Steenkiste (Google Research)
Klaus Greff (Google Brain)
Michael Mozer (Google Research, Brain Team)
Thomas Kipf (Google Research)
More from the Same Authors
-
2021 : Systematic Evaluation of Causal Discovery in Visual Model Based Reinforcement Learning »
Nan Rosemary Ke · Aniket Didolkar · Sarthak Mittal · Anirudh Goyal · Guillaume Lajoie · Stefan Bauer · Danilo Jimenez Rezende · Yoshua Bengio · Chris Pal · Michael Mozer -
2021 : Exploring through Random Curiosity with General Value Functions »
Aditya Ramesh · Louis Kirsch · Sjoerd van Steenkiste · Jürgen Schmidhuber -
2021 : Unsupervised Learning of Temporal Abstractions using Slot-based Transformers »
Anand Gopalakrishnan · Kazuki Irie · Jürgen Schmidhuber · Sjoerd van Steenkiste -
2021 : Learning Neural Causal Models with Active Interventions »
Nino Scherrer · Olexa Bilaniuk · Yashas Annadani · Anirudh Goyal · Patrick Schwab · Bernhard Schölkopf · Michael Mozer · Yoshua Bengio · Stefan Bauer · Nan Rosemary Ke -
2021 : Unsupervised Learning of Temporal Abstractions using Slot-based Transformers »
Anand Gopalakrishnan · Kazuki Irie · Jürgen Schmidhuber · Sjoerd van Steenkiste -
2022 : Neural Network Online Training with Sensitivity to Multiscale Temporal Structure »
Matt Jones · Tyler Scott · Gamaleldin Elsayed · Mengye Ren · Katherine Hermann · David Mayo · Michael Mozer -
2022 : Test-time adaptation with slot-centric models »
Mihir Prabhudesai · Sujoy Paul · Sjoerd van Steenkiste · Mehdi S. M. Sajjadi · Anirudh Goyal · Deepak Pathak · Katerina Fragkiadaki · Gaurav Aggarwal · Thomas Kipf -
2022 : Spatial Symmetry in Slot Attention »
Ondrej Biza · Sjoerd van Steenkiste · Mehdi S. M. Sajjadi · Gamaleldin Elsayed · Aravindh Mahendran · Thomas Kipf -
2022 : Teacher-generated pseudo human spatial-attention labels boost contrastive learning models »
Yushi Yao · Chang Ye · Junfeng He · Gamaleldin Elsayed -
2022 : Test-time adaptation with slot-centric models »
Mihir Prabhudesai · Sujoy Paul · Sjoerd van Steenkiste · Mehdi S. M. Sajjadi · Anirudh Goyal · Deepak Pathak · Katerina Fragkiadaki · Gaurav Aggarwal · Thomas Kipf -
2022 : An Empirical Study on Clustering Pretrained Embeddings: Is Deep Strictly Better? »
Tyler Scott · Ting Liu · Michael Mozer · Andrew Gallagher -
2022 Workshop: Workshop on neuro Causal and Symbolic AI (nCSI) »
Matej Zečević · Devendra Dhami · Christina Winkler · Thomas Kipf · Robert Peharz · Petar Veličković -
2022 Poster: Exploring through Random Curiosity with General Value Functions »
Aditya Ramesh · Louis Kirsch · Sjoerd van Steenkiste · Jürgen Schmidhuber -
2022 Poster: Object Scene Representation Transformer »
Mehdi S. M. Sajjadi · Daniel Duckworth · Aravindh Mahendran · Sjoerd van Steenkiste · Filip Pavetic · Mario Lucic · Leonidas Guibas · Klaus Greff · Thomas Kipf -
2021 Poster: Improving Anytime Prediction with Parallel Cascaded Networks and a Temporal-Difference Loss »
Michael Iuzzolino · Michael Mozer · Samy Bengio -
2021 Poster: Soft Calibration Objectives for Neural Networks »
Archit Karandikar · Nicholas Cain · Dustin Tran · Balaji Lakshminarayanan · Jonathon Shlens · Michael Mozer · Becca Roelofs -
2021 Poster: Neural Production Systems »
Anirudh Goyal · Aniket Didolkar · Nan Rosemary Ke · Charles Blundell · Philippe Beaudoin · Nicolas Heess · Michael Mozer · Yoshua Bengio -
2021 Poster: Discrete-Valued Neural Communication »
Dianbo Liu · Alex Lamb · Kenji Kawaguchi · Anirudh Goyal · Chen Sun · Michael Mozer · Yoshua Bengio -
2020 Workshop: Object Representations for Learning and Reasoning »
William Agnew · Rim Assouel · Michael Chang · Antonia Creswell · Eliza Kosoy · Aravind Rajeswaran · Sjoerd van Steenkiste -
2020 Poster: Object-Centric Learning with Slot Attention »
Francesco Locatello · Dirk Weissenborn · Thomas Unterthiner · Aravindh Mahendran · Georg Heigold · Jakob Uszkoreit · Alexey Dosovitskiy · Thomas Kipf -
2020 Spotlight: Object-Centric Learning with Slot Attention »
Francesco Locatello · Dirk Weissenborn · Thomas Unterthiner · Aravindh Mahendran · Georg Heigold · Jakob Uszkoreit · Alexey Dosovitskiy · Thomas Kipf -
2019 Poster: Are Disentangled Representations Helpful for Abstract Visual Reasoning? »
Sjoerd van Steenkiste · Francesco Locatello · Jürgen Schmidhuber · Olivier Bachem -
2019 Poster: Saccader: Improving Accuracy of Hard Attention Models for Vision »
Gamaleldin Elsayed · Simon Kornblith · Quoc V Le -
2018 : Panel »
Paroma Varma · Aditya Grover · Will Hamilton · Jessica Hamrick · Thomas Kipf · Marinka Zitnik -
2018 : Compositional Imitation Learning: Explaining and executing one task at a time »
Thomas Kipf -
2018 Poster: Large Margin Deep Networks for Classification »
Gamaleldin Elsayed · Dilip Krishnan · Hossein Mobahi · Kevin Regan · Samy Bengio -
2018 Poster: Adversarial Examples that Fool both Computer Vision and Time-Limited Humans »
Gamaleldin Elsayed · Shreya Shankar · Brian Cheung · Nicolas Papernot · Alexey Kurakin · Ian Goodfellow · Jascha Sohl-Dickstein -
2017 : Relational neural expectation maximization »
Sjoerd van Steenkiste -
2017 Poster: Neural Expectation Maximization »
Klaus Greff · Sjoerd van Steenkiste · Jürgen Schmidhuber