Timezone: »

Pay Attention to What You Need: Do Structural Priors Still Matter in the Age of Billion Parameter Models?
Irina Higgins · Antonia Creswell · Sébastien Racanière

Mon Dec 06 01:00 AM -- 04:30 AM (PST) @ None

The last few years have seen the emergence of billion parameter models trained on 'infinite' data that achieve impressive performance on many tasks, suggesting that big data and big models may be all we need. But how far can this approach take us, in particular on domains where data is more limited? In many situations adding structured architectural priors to models may be key to achieving faster learning, better generalisation and learning from less data. Structure can be added at the level of perception and at the level of reasoning - the goal of GOFAI research. In this tutorial we will use the idea of symmetries and symbolic reasoning as an overarching theoretical framework to describe many of the common structural priors that have been successful in the past for building more data efficient and generalisable perceptual models, and models that support better reasoning in neuro-symbolic approaches.

Mon 1:00 a.m. - 1:45 a.m.

Ever since the 1950s AI scientists have been experimenting with the prospect of using computer technology to emulate human intelligence. While universal function approximation theorems promised success in this pursuit provided the kinds of tasks human intelligence was solving could be formulated as continuous function approximation problems, and provided enough scale was available to train MLPs of arbitrary width or depth, we find ourselves in the age of billion parameter models, and yet still far away from being able to replicate all aspects of human intelligence. Also, our models are not MLPs, but convolutional, recurrent, or otherwise structured neural networks. In this talk we will discuss why that is, and consider the general principles that can guide us towards building a new generation of neural networks with the kinds of structure that can solve the full spectrum of tasks that human intelligence can solve.

Irina Higgins
Mon 1:45 a.m. - 1:55 a.m.
Mon 1:55 a.m. - 2:00 a.m.
Mon 2:00 a.m. - 2:45 a.m.

Modern neural networks are powerful function approximators which manipulate complex input data to extract information useful for one or more tasks. Extracting information is a process which transforms data to make it simpler and simpler to use. For example, binary classification might start from high dimensional images, and progressively simplify this data to shrink it down to a single number between 0 and 1: the probability that the original image contained a dog. Many transformations of the input image should lead to the same output. For example, changing background colour, rotating images, and so on, should not affect the answer provided by the neural network. When the transformations applied to the input are invertible, we are dealing with symmetries of the inputs. And a question arises of whether knowledge of such symmetries can help researchers devise better neural networks. In this section, we will visit the subject of symmetries, their benefits, and see examples of their usage.

Sébastien Racanière
Mon 2:45 a.m. - 2:55 a.m.
Mon 2:55 a.m. - 3:00 a.m.
Mon 3:00 a.m. - 3:45 a.m.

Deep learning has led to some incredible successes in a very broad range of applications within AI. However, deep learning models remain black boxes, often unable to explain how they reach their final answers with no clear signals as to “what went wrong?” when models fail. Further, they typically require huge amounts of data during training and often do not generalize well beyond the data they have been trained on. But AI has not always been this way. In the “Good-Old days”, GOFAI did not require any data at all and the final solutions were interpretable, but the AI’s were not grounded in the real world. Further, unlike deep learning where a single general algorithm could be used to learn to solve many different problems, a single GOFAI algorithm can only be applied to a single task. So can we have our cake and eat it too? Is there a solution to AI out there that requires a limited amount of data, is interpretable, generalises well to new problems and can be applied to a wide variety of tasks? One interesting, developing area of AI that could answer this question is NeuroSymbolic AI which combines deep learning and logical reasoning in a single model. In this tutorial we will explore these models in the context of “structure” identifying how varying degrees of structure in a model affects its interpretability, how well it generalises to new data as the generality of the algorithm and the variety of tasks it can be applied to.

Antonia Creswell
Mon 3:45 a.m. - 3:50 a.m.
Mon 3:50 a.m. - 4:30 a.m.

Author Information

Irina Higgins (DeepMind)

Irina Higgins is a Staff Research Scientist at DeepMind, where she works in the Froniers team. Her work aims to bring together insights from the fields of neuroscience and physics to advance general artificial intelligence through improved representation learning. Before joining DeepMind, Irina was a British Psychological Society Undergraduate Award winner for her achievements as an undergraduate student in Experimental Psychology at Westminster University, followed by a DPhil at the Oxford Centre for Computational Neuroscience and Artificial Intelligence, where she focused on understanding the computational principles underlying speech processing in the auditory brain. During her DPhil, Irina also worked on developing poker AI, applying machine learning in the finance sector, and working on speech recognition at Google Research.

Antonia Creswell (Imperial College London)

Antonia Creswell is a Senior Research Scientist at DeepMind in the Cognition team. Her work focuses on the learning and integration of object representations in dynamic models. She completed her PhD on representation learning at Imperial College London in the department of Bioengineering.

Seb Racanière (DeepMind)

Sébastien Racanière is a Staff Research Engineer in DeepMind. His current interests in ML revolve around the interaction between Physics and Machine Learning, with an emphasis on the use of symmetries. He got his PhD in pure mathematics from the Université Louis Pasteur, Strasbourg, in 2002, with co-supervisors Michèle Audin (Strasbourg) and Frances Kirwan (Oxford). This was followed by a two years Marie-Curie Individual Fellowship in Imperial College, London, and another postdoc in Cambridge (UK). His first job in the industry was at the Samsung European Research Institute, investigating the use of Learning Algorithms in mobile phones, followed by UGS, a Cambridge based company, working on a 3D search engine. He afterwards worked for Maxeler, in London, programming FPGAs. He then moved to Google, and finally DeepMind.

More from the Same Authors