One of the key challenges for AI is to understand, predict, and model data over time. Pretrained networks should be able to temporally generalize, or adapt to shifts in data distributions that occur over time. Our current state-of-the-art (SOTA) still struggles to model and understand data over long temporal durations – for example, SOTA models are limited to processing several seconds of video, and powerful transformer models are still fundamentally limited by their attention spans. On the other hand, humans and other biological systems are able to flexibly store and update information in memory to comprehend and manipulate multimodal streams of input. Cognitive neuroscientists propose that they do so via the interaction of multiple memory systems with different neural mechanisms. What types of memory systems and mechanisms already exist in our current AI models? First, there are extensions of the classic proposal that memories are formed via synaptic plasticity mechanisms – information can be stored in the static weights of a pre-trained network, or in fast weights that more closely resemble short-term plasticity mechanisms. Then there are persistent memory states, such as those in LSTMs or in external differentiable memory banks, which store information as neural activations that can change over time. Finally, there are models augmented with static databases of knowledge, akin to a high-precision long-term memory or semantic memory in humans. When is it useful to store information in each one of these mechanisms, and how should models retrieve from them or modify the information therein? How should we design models that may combine multiple memory mechanisms to address a problem? Furthermore, do the shortcomings of current models require some novel memory systems that retain information over different timescales, or with different capacity or precision? Finally, what can we learn from memory processes in biological systems that may advance our models in AI? We aim to explore how a deeper understanding of memory mechanisms can improve task performance in many different application domains, such as lifelong / continual learning, reinforcement learning, computer vision, and natural language processing.
Live content is unavailable. Log in and register to view live content