Timezone: »

Attention-based Neural Cellular Automata
Mattie Tesfaldet · Derek Nowrouzezahrai · Chris Pal

Thu Dec 01 02:00 PM -- 04:00 PM (PST) @ Hall J #914

Recent extensions of Cellular Automata (CA) have incorporated key ideas from modern deep learning, dramatically extending their capabilities and catalyzing a new family of Neural Cellular Automata (NCA) techniques. Inspired by Transformer-based architectures, our work presents a new class of attention-based NCAs formed using a spatially localized—yet globally organized—self-attention scheme. We introduce an instance of this class named Vision Transformer Cellular Automata (ViTCA). We present quantitative and qualitative results on denoising autoencoding across six benchmark datasets, comparing ViTCA to a U-Net, a U-Net-based CA baseline (UNetCA), and a Vision Transformer (ViT). When comparing across architectures configured to similar parameter complexity, ViTCA architectures yield superior performance across all benchmarks and for nearly every evaluation metric. We present an ablation study on various architectural configurations of ViTCA, an analysis of its effect on cell states, and an investigation on its inductive biases. Finally, we examine its learned representations via linear probes on its converged cell state hidden representations, yielding, on average, superior results when compared to our U-Net, ViT, and UNetCA baselines.

Author Information

Mattie Tesfaldet (McGill University & MILA)

Mattie Tesfaldet (they/them) is a computer vision and machine learning researcher, artist, and DJ based in Montréal, Canada. They are pursuing their PhD at McGill University and Mila researching generative models for visual content creation, specifically, looking for novel and interesting ways images and videos can be represented with neural networks. Outside of academia, they like to apply their research with the aim of exploring the intersection of human creativity and artificial intelligence. Particularly, developing new AI-based mediums for communication, expression, and sharing of visual imagery.

Derek Nowrouzezahrai (McGill University)
Chris Pal (Montreal Institute for Learning Algorithms, École Polytechnique, Université de Montréal)

More from the Same Authors