Timezone: »

Learning to Control Rapidly Changing Synaptic Connections: An Alternative Type of Memory in Sequence Processing Artificial Neural Networks
Kazuki Irie · Jürgen Schmidhuber
Event URL: https://openreview.net/forum?id=0v8F7-los8i »

Short-term memory in standard, general-purpose, sequence-processing recurrent neural networks (RNNs) is stored as activations of nodes or ''neurons.'' Generalizing feedforward NNs (FNNs) to such RNNs is mathematically straightforward and natural, and even historical: already in 1943, McCulloch and Pitts proposed this as a surrogate to ''synaptic modifications,'' generalizing the Lenz-Ising model, the first RNN architecture of 1925. A lesser known alternative approach to storing short-term memory in ''synaptic connections''---by parameterising and controlling the dynamics of a context-sensitive time-varying weight matrix through another NN---yields another ''natural'' type of short-term memory in sequence processing NNs: the Fast Weight Programmers (FWPs) of the early 1990s. FWPs have seen a recent revival as generic sequence processors, achieving competitive performance across various tasks. They are formally closely related to the now popular Transformers. Here we present them in the context of artificial NNs as an abstraction of biological NNs---a perspective that has not been stressed enough in previous FWP work. We first review aspects of FWPs for pedagogical purposes, then discuss connections to related works motivated by insights from neuroscience.

Author Information

Kazuki Irie (Swiss AI Lab IDSIA, University of Lugano)
Jürgen Schmidhuber (Swiss AI Lab, IDSIA (USI & SUPSI); NNAISENSE; KAUST)

Since age 15 or so, the main goal of professor Jürgen Schmidhuber has been to build a self-improving Artificial Intelligence (AI) smarter than himself, then retire. His lab's Deep Learning Neural Networks based on ideas published in the "Annus Mirabilis" 1990-1991 have revolutionised machine learning and AI. By the mid 2010s, they were on 3 billion devices, and used billions of times per day through users of the world's most valuable public companies, e.g., for greatly improved (CTC-LSTM-based) speech recognition on all Android phones, greatly improved machine translation through Google Translate and Facebook (over 4 billion LSTM-based translations per day), Apple's Siri and Quicktype on all iPhones, the answers of Amazon's Alexa, and numerous other applications. In 2011, his team was the first to win official computer vision contests through deep neural nets, with superhuman performance. In 2012, they had the first deep NN to win a medical imaging contest (on cancer detection). All of this attracted enormous interest from industry. His research group also established the fields of mathematically rigorous universal AI and recursive self-improvement in metalearning machines that learn to learn (since 1987). In 1990, he introduced unsupervised adversarial neural networks that fight each other in a minimax game to achieve artificial curiosity (GANs are a special case). In 1991, he introduced very deep learning through unsupervised pre-training, and neural fast weight programmers formally equivalent to what's now called linear Transformers. His formal theory of creativity & curiosity & fun explains art, science, music, and humor. He also generalized algorithmic information theory and the many-worlds theory of physics, and introduced the concept of Low-Complexity Art, the information age's extreme form of minimal art. He is recipient of numerous awards, author of over 350 peer-reviewed papers, and Chief Scientist of the company NNAISENSE, which aims at building the first practical general purpose AI. He is a frequent keynote speaker, and advising various governments on AI strategies.

More from the Same Authors