Timezone: »

Neural Attentive Circuits
Martin Weiss · Nasim Rahaman · Francesco Locatello · Chris Pal · Yoshua Bengio · Bernhard Schölkopf · Li Erran Li · Nicolas Ballas

Thu Dec 01 09:00 AM -- 11:00 AM (PST) @ Hall J #410

Recent work has seen the development of general purpose neural architectures that can be trained to perform tasks across diverse data modalities. General purpose models typically make few assumptions about the underlying data-structure and are known to perform well in the large-data regime. At the same time, there has been growing interest in modular neural architectures that represent the data using sparsely interacting modules. These models can be more robust out-of-distribution, computationally efficient, and capable of sample-efficient adaptation to new data. However, they tend to make domain-specific assumptions about the data, and present challenges in how module behavior (i.e., parameterization) and connectivity (i.e., their layout) can be jointly learned. In this work, we introduce a general purpose, yet modular neural architecture called Neural Attentive Circuits (NACs) that jointly learns the parameterization and a sparse connectivity of neural modules without using domain knowledge. NACs are best understood as the combination of two systems that are jointly trained end-to-end: one that determines the module configuration and the other that executes it on an input. We demonstrate qualitatively that NACs learn diverse and meaningful module configurations on the Natural Language and Visual Reasoning for Real (NLVR2) dataset without additional supervision. Quantitatively, we show that by incorporating modularity in this way, NACs improve upon a strong non-modular baseline in terms of low-shot adaptation on CIFAR and Caltech-UCSD Birds dataset (CUB) by about 10 percent, and OOD robustness on Tiny ImageNet-R by about 2.5 percent. Further, we find that NACs can achieve an 8x speedup at inference time while losing less than 3 percent performance. Finally, we find NACs to yield competitive results on diverse data modalities spanning point-cloud classification, symbolic processing and text-classification from ASCII bytes, thereby confirming its general purpose nature.

Author Information

Martin Weiss (Mila)
Nasim Rahaman (Max Planck Institute for Intelligent Systems, Max-Planck Institute)
Francesco Locatello (Amazon)
Chris Pal (Montreal Institute for Learning Algorithms, École Polytechnique, Université de Montréal)
Yoshua Bengio (Mila / U. Montreal)

Yoshua Bengio is Full Professor in the computer science and operations research department at U. Montreal, scientific director and founder of Mila and of IVADO, Turing Award 2018 recipient, Canada Research Chair in Statistical Learning Algorithms, as well as a Canada AI CIFAR Chair. He pioneered deep learning and has been getting the most citations per day in 2018 among all computer scientists, worldwide. He is an officer of the Order of Canada, member of the Royal Society of Canada, was awarded the Killam Prize, the Marie-Victorin Prize and the Radio-Canada Scientist of the year in 2017, and he is a member of the NeurIPS advisory board and co-founder of the ICLR conference, as well as program director of the CIFAR program on Learning in Machines and Brains. His goal is to contribute to uncover the principles giving rise to intelligence through learning, as well as favour the development of AI for the benefit of all.

Bernhard Schölkopf (MPI for Intelligent Systems, Tübingen)

Bernhard Scholkopf received degrees in mathematics (London) and physics (Tubingen), and a doctorate in computer science from the Technical University Berlin. He has researched at AT&T Bell Labs, at GMD FIRST, Berlin, at the Australian National University, Canberra, and at Microsoft Research Cambridge (UK). In 2001, he was appointed scientific member of the Max Planck Society and director at the MPI for Biological Cybernetics; in 2010 he founded the Max Planck Institute for Intelligent Systems. For further information, see www.kyb.tuebingen.mpg.de/~bs.

Li Erran Li (AWS AI, Amazon)

Li Erran Li is the head of machine learning at Scale and an adjunct professor at Columbia University. Previously, he was chief scientist at Pony.ai. Before that, he was with the perception team at Uber ATG and machine learning platform team at Uber where he worked on deep learning for autonomous driving, led the machine learning platform team technically, and drove strategy for company-wide artificial intelligence initiatives. He started his career at Bell Labs. Li’s current research interests are machine learning, computer vision, learning-based robotics, and their application to autonomous driving. He has a PhD from the computer science department at Cornell University. He’s an ACM Fellow and IEEE Fellow.

Nicolas Ballas (Facebook AI Research)

More from the Same Authors