Timezone: »
DALL-E has shown an impressive ability to generate novel --- significantly and systematically different from the training distribution --- yet realistic images. This is possible because it utilizes the dataset of text-image pairs where the text provides the source of compositionality. Following this result, an important extending question is whether this compositionality can still be achieved even without conditioning on text. In this paper, we propose a simple but novel slot-based autoencoding architecture, called SLATE, that achieves this text-free DALL-E by learning compositional slot-based representations purely from images, an ability lacking in DALL-E. Unlike existing object-centric representation models that decode pixels independently for each slot and each pixel location and compose them via mixture-based alpha composition, we propose to use the Image GPT decoder conditioned on the slots for a more flexible generation by capturing complex interaction among the pixels and the slots. In experiments, we show that this simple architecture achieves zero-shot generation of novel images without text and better quality in generation than the models based on mixture decoders.
Author Information
Gautam Singh (Rutgers University)
I am starting my second year as a Ph.D. student at the Department of Computer Science at Rutgers University. My focus area is probabilistic generative models. Prior to this, I worked at IBM Research India for 3 years after finishing my undergrad from IIT Guwahati.
Fei Deng (Rutgers University)
Sungjin Ahn (KAIST)
More from the Same Authors
-
2021 : DreamerPro: Reconstruction-Free Model-Based Reinforcement Learning with Prototypical Representations »
Fei Deng · Ingook Jang · Sungjin Ahn -
2021 : TransDreamer: Reinforcement Learning with Transformer World Models »
· Jaesik Yoon · Yi-Fu Wu · Sungjin Ahn -
2022 Poster: Simple Unsupervised Object-Centric Learning for Complex and Naturalistic Videos »
Gautam Singh · Yi-Fu Wu · Sungjin Ahn -
2020 : Invited Talk: Sungjin Ahn »
Sungjin Ahn -
2020 Poster: Generative Neurosymbolic Machines »
Jindong Jiang · Sungjin Ahn -
2020 Spotlight: Generative Neurosymbolic Machines »
Jindong Jiang · Sungjin Ahn -
2019 Poster: Variational Temporal Abstraction »
Taesup Kim · Sungjin Ahn · Yoshua Bengio -
2019 Poster: Neural Multisensory Scene Inference »
Jae Hyun Lim · Pedro O. Pinheiro · Negar Rostamzadeh · Chris Pal · Sungjin Ahn -
2019 Poster: Sequential Neural Processes »
Gautam Singh · Jaesik Yoon · Youngsung Son · Sungjin Ahn -
2019 Spotlight: Sequential Neural Processes »
Gautam Singh · Jaesik Yoon · Youngsung Son · Sungjin Ahn -
2018 Poster: Bayesian Model-Agnostic Meta-Learning »
Jaesik Yoon · Taesup Kim · Ousmane Dia · Sungwoong Kim · Yoshua Bengio · Sungjin Ahn -
2018 Spotlight: Bayesian Model-Agnostic Meta-Learning »
Jaesik Yoon · Taesup Kim · Ousmane Dia · Sungwoong Kim · Yoshua Bengio · Sungjin Ahn