NeurIPS Inverted-Attention Transformers can Learn Object Representations: Insights from Slot Attention

Spotlight
in
Workshop: UniReps: Unifying Representations in Neural Models

Inverted-Attention Transformers can Learn Object Representations: Insights from Slot Attention

Yi-Fu Wu · Klaus Greff · Gamaleldin Elsayed · Michael Mozer · Thomas Kipf · Sjoerd van Steenkiste

[ Abstract ] [ Project Page ]

[ OpenReview]

Abstract:

Visual reasoning is supported by a causal understanding of the physical world, and theories of human cognition suppose that a necessary step to causal understanding is the discovery and representation of high-level entities like objects. Slot Attention is a popular method aimed at object-centric learning, and its popularity has resulted in dozens of variants and extensions. To help understand the core assumptions that lead to successful object-centric learning, we take a step back and identify the minimal set of changes to a standard Transformer architecture to obtain the same performance as the specialized Slot Attention models. We systematically evaluate the performance and scaling behaviour of several intermediate'' architectures on seven image and video datasets from prior work. Our analysis reveals that by simply inverting the attention mechanism of Transformers, we obtain performance competitive with state-of-the-art Slot Attention in several domains.

Chat is not available.

Spotlight in Workshop: UniReps: Unifying Representations in Neural Models

Inverted-Attention Transformers can Learn Object Representations: Insights from Slot Attention

Yi-Fu Wu · Klaus Greff · Gamaleldin Elsayed · Michael Mozer · Thomas Kipf · Sjoerd van Steenkiste

Spotlight
in
Workshop: UniReps: Unifying Representations in Neural Models