Systems for Foundation Models, and Foundation Models for Systems.
Moderator : Celestine Mendler-Dünner
Hall E (level 1)
I'm a simple creature. I fell in love with foundation models (FMs) because they radically improved data systems that I had been trying to build for a decade–and they are just awesome! This talk starts with my perspective about how FMs change the systems we build, focusing on what I call "death by a thousand cuts" problems. Roughly, these are problems in which each individual task looks easy, but the sheer variety and breadth of tasks make them hard.
The bulk of the talk is about understanding how to efficiently build foundation models. We describe trends in hardware accelerators from a perhaps unexpected viewpoint: database systems research. Databases have worried about optimizing IO – reads and writes within the memory hierarchy – since the 80s. In fact, optimizing IO led to Flash Attention for Transformers.
But are there more efficient architectures for foundation models than the Transformer? Maybe! I'll describe a new class of architectures based on classical signal processing, exemplified by S4. These new architectures: are asymptotically more efficient than Transformers for long sequences, have achieved state-of-the-art quality on benchmarks like long range arena, and have been applied to images, text, DNA, audio, video. S4 will allow us to make mathematically precise connections to RNNs and CNNs. I’ll also describe new twists, such as, long filters, data-dependent convolutions, and gating, that power many of these amazing recent architectures including RWKV, S5, Mega, Hyena, and RetNet, and recent work to understand their fundamental limitations to hopefully make even more awesome foundation models!
A github containing material from is under construction at https://github.com/HazyResearch/aisys-building-blocks. Please feel free to add to it!