Associative Memories with Heavy-Tailed Data
Vivien Cabannes · Elvis Dohmatob · Alberto Bietti
Keywords:
associative memory
Zipf data
optimization-based algorithm
mechanistic interpretability
scaling law
Abstract
Learning arguably involves the discovery and memorization of abstract rules.But how associative memories appear in transformer architectures optimized with gradient descent algorithms?We derive precise scaling laws for a simple input-output associative memory model with respect to parameter size, and discuss the statistical efficiency of different estimators, including optimization-based algorithms.We provide extensive numerical experiments to validate and interpret theoretical results, including fine-grained visualizations of the stored memory associations.
Chat is not available.
Successful Page Load