Timezone: »
State space models (SSM) have recently been shown to be very effective as a deep learning layer as a promising alternative to sequence models such as RNNs, CNNs, or Transformers. The first version to show this potential was the S4 model, which is particularly effective on tasks involving long-range dependencies by using a prescribed state matrix called the HiPPO matrix. While this has an interpretable mathematical mechanism for modeling long dependencies, it also requires a custom representation and algorithm that makes the model difficult to understand and implement. On the other hand, a recent variant of S4 called DSS showed that restricting the state matrix to be fully diagonal can still preserve the performance of the original model when using a specific initialization based on approximating S4's matrix. This work seeks to systematically understand how to parameterize and initialize diagonal state space models. While it follows from classical results that almost all SSMs have an equivalent diagonal form, we show that the initialization is critical for performance. First, we explain why DSS works mathematically, as the diagonal approximation to S4 surprisingly recovers the same dynamics in the limit of infinite state dimension. We then systematically describe various design choices in parameterizing and computing diagonal SSMs, and perform a controlled empirical study ablating the effects of these choices. Our final model S4D is a simple diagonal version of S4 whose kernel computation requires just 3 lines of code and performs comparably to S4 in almost all settings, with state-of-the-art results in image, audio, and medical time-series domains, and 85\% average on the Long Range Arena benchmark.
Author Information
Albert Gu (Stanford)
Karan Goel (Stanford University)
Ankit Gupta (Tel Aviv University)
Christopher Ré (Stanford)
More from the Same Authors
-
2021 : Personalized Benchmarking with the Ludwig Benchmarking Toolkit »
Avanika Narayan · Piero Molino · Karan Goel · Willie Neiswanger · Christopher Ré -
2021 : SKM-TEA: A Dataset for Accelerated MRI Reconstruction with Dense Image Labels for Quantitative Clinical Evaluation »
Arjun Desai · Andrew Schmidt · Elka Rubin · Christopher Sandino · Marianne Black · Valentina Mazzoli · Kathryn Stevens · Robert Boutin · Christopher Ré · Garry Gold · Brian Hargreaves · Akshay Chaudhari -
2021 : Combining Recurrent, Convolutional, and Continuous-Time Models with Structured Learnable Linear State-Space Layers »
Isys Johnson · Albert Gu · Karan Goel · Khaled Saab · Tri Dao · Atri Rudra · Christopher Ré -
2023 Poster: HyenaDNA: Long-Range Genomic Sequence Modeling at Single Nucleotide Resolution »
Eric Nguyen · Michael Poli · Marjan Faizi · Armin Thomas · Michael Wornow · Callum Birch-Sykes · Stefano Massaroli · Aman Patel · Clayton Rabideau · Yoshua Bengio · Stefano Ermon · Christopher Ré · Stephen Baccus -
2023 Poster: Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture »
Dan Fu · Jessica R Grogan · Isys Johnson · Simran Arora · Evan Sabri Eyuboglu · Armin Thomas · Benjamin Spector · Michael Poli · Atri Rudra · Christopher Ré -
2023 Poster: A case for reframing automated medical image classification as segmentation »
Sarah Hooper · Mayee Chen · Khaled Saab · Kush Bhatia · Curtis Langlotz · Christopher Ré -
2023 Poster: TART: A plug-and-play Transformer module for task-agnostic reasoning »
Kush Bhatia · Avanika Narayan · Christopher De Sa · Christopher Ré -
2023 Poster: H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models »
Zhenyu Zhang · Ying Sheng · Tianyi Zhou · Tianlong Chen · Lianmin Zheng · Ruisi Cai · Zhao Song · Yuandong Tian · Christopher Ré · Clark Barrett · Zhangyang Wang · Beidi Chen -
2023 Poster: Laughing Hyena Distillery: Extracting Compact Recurrences From Convolutions »
Stefano Massaroli · Michael Poli · Dan Fu · Hermann Kumbong · David Romero · Rom Parnichkun · Aman Timalsina · Quinn McIntyre · Beidi Chen · Atri Rudra · Ce Zhang · Christopher Ré · Stefano Ermon · Yoshua Bengio -
2023 Poster: Skill-it! A data-driven skills framework for understanding and training language models »
Mayee Chen · Nicholas Roberts · Kush Bhatia · Jue WANG · Ce Zhang · Frederic Sala · Christopher Ré -
2023 Poster: Embroid: Unsupervised Prediction Smoothing Can Improve Few-Shot Classification »
Neel Guha · Mayee Chen · Kush Bhatia · Azalia Mirhoseini · Frederic Sala · Christopher Ré -
2023 Poster: Structured State Space Models for In-Context Reinforcement Learning »
Chris Lu · Yannick Schroecker · Albert Gu · Emilio Parisotto · Jakob Foerster · Satinder Singh · Feryal Behbahani -
2023 Poster: LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models »
Neel Guha · Julian Nyarko · Daniel Ho · Christopher Ré · Adam Chilton · Aditya K · Alex Chohlas-Wood · Austin Peters · Brandon Waldon · Daniel Rockmore · Diego Zambrano · Dmitry Talisman · Enam Hoque · Faiz Surani · Frank Fagan · Galit Sarfaty · Gregory Dickinson · Haggai Porat · Jason Hegland · Jessica Wu · Joe Nudell · Joel Niklaus · John Nay · Jonathan Choi · Kevin Tobia · Margaret Hagan · Megan Ma · Michael Livermore · Nikon Rasumov-Rahe · Nils Holzenberger · Noam Kolt · Peter Henderson · Sean Rehaag · Sharad Goel · Shang Gao · Spencer Williams · Sunny Gandhi · Tom Zur · Varun Iyer · Zehua Li -
2023 Oral: Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture »
Dan Fu · Jessica R Grogan · Isys Johnson · Simran Arora · Evan Sabri Eyuboglu · Armin Thomas · Benjamin Spector · Michael Poli · Atri Rudra · Christopher Ré -
2022 Spotlight: Lightning Talks 2A-4 »
Sarthak Mittal · Richard Grumitt · Zuoyu Yan · Lihao Wang · Dongsheng Wang · Alexander Korotin · Jiangxin Sun · Ankit Gupta · Vage Egiazarian · Tengfei Ma · Yi Zhou · Yishi Xu · Albert Gu · Biwei Dai · Chunyu Wang · Yoshua Bengio · Uros Seljak · Miaoge Li · Guillaume Lajoie · Yiqun Wang · Liangcai Gao · Lingxiao Li · Jonathan Berant · Huang Hu · Xiaoqing Zheng · Zhibin Duan · Hanjiang Lai · Evgeny Burnaev · Zhi Tang · Zhi Jin · Xuanjing Huang · Chaojie Wang · Yusu Wang · Jian-Fang Hu · Bo Chen · Chao Chen · Hao Zhou · Mingyuan Zhou -
2022 Spotlight: Diagonal State Spaces are as Effective as Structured State Spaces »
Ankit Gupta · Albert Gu · Jonathan Berant -
2022 Spotlight: Machine Learning on Graphs: A Model and Comprehensive Taxonomy »
Ines Chami · Sami Abu-El-Haija · Bryan Perozzi · Christopher Ré · Kevin Murphy -
2022 : Panel Discussion: Opportunities and Challenges »
Kenneth Norman · Janice Chen · Samuel J Gershman · Albert Gu · Sepp Hochreiter · Ida Momennejad · Hava Siegelmann · Sainbayar Sukhbaatar -
2022 Poster: Self-Supervised Learning of Brain Dynamics from Broad Neuroimaging Data »
Armin Thomas · Christopher Ré · Russell Poldrack -
2022 Poster: Diagonal State Spaces are as Effective as Structured State Spaces »
Ankit Gupta · Albert Gu · Jonathan Berant -
2022 Poster: HAPI: A Large-scale Longitudinal Dataset of Commercial ML API Predictions »
Lingjiao Chen · Zhihua Jin · Evan Sabri Eyuboglu · Christopher Ré · Matei Zaharia · James Zou -
2022 Poster: FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness »
Tri Dao · Dan Fu · Stefano Ermon · Atri Rudra · Christopher Ré -
2022 Poster: Contrastive Adapters for Foundation Model Group Robustness »
Michael Zhang · Christopher Ré -
2022 Poster: Decentralized Training of Foundation Models in Heterogeneous Environments »
Binhang Yuan · Yongjun He · Jared Davis · Tianyi Zhang · Tri Dao · Beidi Chen · Percy Liang · Christopher Ré · Ce Zhang -
2022 Poster: Transform Once: Efficient Operator Learning in Frequency Domain »
Michael Poli · Stefano Massaroli · Federico Berto · Jinkyoo Park · Tri Dao · Christopher Ré · Stefano Ermon -
2022 Poster: Machine Learning on Graphs: A Model and Comprehensive Taxonomy »
Ines Chami · Sami Abu-El-Haija · Bryan Perozzi · Christopher Ré · Kevin Murphy -
2022 Poster: S4ND: Modeling Images and Videos as Multidimensional Signals with State Spaces »
Eric Nguyen · Karan Goel · Albert Gu · Gordon Downs · Preey Shah · Tri Dao · Stephen Baccus · Christopher Ré -
2022 Poster: Fine-tuning Language Models over Slow Networks using Activation Quantization with Guarantees »
Jue WANG · Binhang Yuan · Luka Rimanic · Yongjun He · Tri Dao · Beidi Chen · Christopher Ré · Ce Zhang -
2021 Poster: Combining Recurrent, Convolutional, and Continuous-time Models with Linear State Space Layers »
Albert Gu · Isys Johnson · Karan Goel · Khaled Saab · Tri Dao · Atri Rudra · Christopher Ré -
2021 Poster: Rethinking Neural Operations for Diverse Tasks »
Nicholas Roberts · Mikhail Khodak · Tri Dao · Liam Li · Christopher Ré · Ameet Talwalkar -
2020 Workshop: Differential Geometry meets Deep Learning (DiffGeo4DL) »
Joey Bose · Emile Mathieu · Charline Le Lan · Ines Chami · Frederic Sala · Christopher De Sa · Maximilian Nickel · Christopher Ré · Will Hamilton -
2020 Poster: HiPPO: Recurrent Memory with Optimal Polynomial Projections »
Albert Gu · Tri Dao · Stefano Ermon · Atri Rudra · Christopher Ré -
2020 Spotlight: HiPPO: Recurrent Memory with Optimal Polynomial Projections »
Albert Gu · Tri Dao · Stefano Ermon · Atri Rudra · Christopher Ré -
2020 Oral: Hogwild!: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent »
Benjamin Recht · Christopher Ré · Stephen Wright · Feng Niu -
2020 Poster: From Trees to Continuous Embeddings and Back: Hyperbolic Hierarchical Clustering »
Ines Chami · Albert Gu · Vaggos Chatziafratis · Christopher Ré -
2020 Poster: No Subclass Left Behind: Fine-Grained Robustness in Coarse-Grained Classification Problems »
Nimit Sohoni · Jared Dunnmon · Geoffrey Angus · Albert Gu · Christopher Ré -
2019 Workshop: KR2ML - Knowledge Representation and Reasoning Meets Machine Learning »
Veronika Thost · Christian Muise · Kartik Talamadupula · Sameer Singh · Christopher Ré -
2019 Poster: On the Downstream Performance of Compressed Word Embeddings »
Avner May · Jian Zhang · Tri Dao · Christopher Ré -
2019 Spotlight: On the Downstream Performance of Compressed Word Embeddings »
Avner May · Jian Zhang · Tri Dao · Christopher Ré -
2019 Poster: Multi-Resolution Weak Supervision for Sequential Data »
Paroma Varma · Frederic Sala · Shiori Sagawa · Jason Fries · Dan Fu · Saelig Khattar · Ashwini Ramamoorthy · Ke Xiao · Kayvon Fatahalian · James Priest · Christopher Ré -
2019 Poster: Slice-based Learning: A Programming Model for Residual Learning in Critical Data Slices »
Vincent Chen · Sen Wu · Alexander Ratner · Jen Weng · Christopher Ré -
2019 Poster: Hyperbolic Graph Convolutional Neural Networks »
Ines Chami · Zhitao Ying · Christopher Ré · Jure Leskovec -
2018 : Poster Session 1 »
Kyle H Ambert · Brandon Araki · Xiya Cao · Sungjoon Choi · Hao(Jackson) Cui · Jonas Degrave · Yaqi Duan · Mattie Fellows · Carlos Florensa · Karan Goel · Aditya Gopalan · Ming-Xu Huang · Jonathan Hunt · Cyril Ibrahim · Brian Ichter · Maximilian Igl · Zheng Tracy Ke · Igor Kiselev · Anuj Mahajan · Arash Mehrjou · Karl Pertsch · Alexandre Piche · Nicholas Rhinehart · Thomas Ringstrom · Reazul Hasan Russel · Oleh Rybkin · Ion Stoica · Sharad Vikram · Angelina Wang · Ting-Han Wei · Abigail H Wen · I-Chen Wu · Zhengwei Wu · Linhai Xie · Dinghan Shen -
2018 : Spotlights 1 »
Ming-Xu Huang · Hao(Jackson) Cui · Arash Mehrjou · Yaqi Duan · Sharad Vikram · Angelina Wang · Karan Goel · Jonathan Hunt · Zhengwei Wu · Dinghan Shen · Mattie Fellows -
2018 Workshop: Relational Representation Learning »
Aditya Grover · Paroma Varma · Frederic Sala · Christopher Ré · Jennifer Neville · Stefano Ermon · Steven Holtzen -
2018 Demonstration: Automatic Curriculum Generation Applied to Teaching Novices a Short Bach Piano Segment »
Emma Brunskill · Tong Mu · Karan Goel · Jonathan Bragg -
2018 Poster: Learning Compressed Transforms with Low Displacement Rank »
Anna Thomas · Albert Gu · Tri Dao · Atri Rudra · Christopher Ré -
2017 : Poster Session »
David Abel · Nicholas Denis · Maria Eckstein · Ronan Fruit · Karan Goel · Joshua Gruenstein · Anna Harutyunyan · Martin Klissarov · Xiangyu Kong · Aviral Kumar · Saurabh Kumar · Miao Liu · Daniel McNamee · Shayegan Omidshafiei · Silviu Pitis · Paulo Rauber · Melrose Roderick · Tianmin Shu · Yizhou Wang · Shangtong Zhang -
2017 : Spotlights & Poster Session »
David Abel · Nicholas Denis · Maria Eckstein · Ronan Fruit · Karan Goel · Joshua Gruenstein · Anna Harutyunyan · Martin Klissarov · Xiangyu Kong · Aviral Kumar · Saurabh Kumar · Miao Liu · Daniel McNamee · Shayegan Omidshafiei · Silviu Pitis · Paulo Rauber · Melrose Roderick · Tianmin Shu · Yizhou Wang · Shangtong Zhang -
2017 Workshop: Learning with Limited Labeled Data: Weak Supervision and Beyond »
Isabelle Augenstein · Stephen Bach · Eugene Belilovsky · Matthew Blaschko · Christoph Lampert · Edouard Oyallon · Emmanouil Antonios Platanios · Alexander Ratner · Christopher Ré -
2017 Workshop: ML Systems Workshop @ NIPS 2017 »
Aparna Lakshmiratan · Sarah Bird · Siddhartha Sen · Christopher Ré · Li Erran Li · Joseph Gonzalez · Daniel Crankshaw -
2017 Demonstration: Babble Labble: Learning from Natural Language Explanations »
Braden Hancock · Paroma Varma · Percy Liang · Christopher Ré · Stephanie Wang -
2017 Poster: Learning to Compose Domain-Specific Transformations for Data Augmentation »
Alexander Ratner · Henry Ehrenberg · Zeshan Hussain · Jared Dunnmon · Christopher Ré -
2017 Poster: Gaussian Quadrature for Kernel Features »
Tri Dao · Christopher M De Sa · Christopher Ré -
2017 Spotlight: Gaussian Quadrature for Kernel Features »
Tri Dao · Christopher M De Sa · Christopher Ré -
2017 Poster: Inferring Generative Model Structure with Static Analysis »
Paroma Varma · Bryan He · Payal Bajaj · Nishith Khandwala · Imon Banerjee · Daniel Rubin · Christopher Ré -
2016 : Invited Talk: You've been using asynchrony wrong your whole life! (Chris Re, Stanford) »
Christopher Ré -
2016 Poster: Cyclades: Conflict-free Asynchronous Machine Learning »
Xinghao Pan · Maximilian Lam · Stephen Tu · Dimitris Papailiopoulos · Ce Zhang · Michael Jordan · Kannan Ramchandran · Christopher Ré · Benjamin Recht -
2016 Poster: Sub-sampled Newton Methods with Non-uniform Sampling »
Peng Xu · Jiyan Yang · Farbod Roosta-Khorasani · Christopher Ré · Michael Mahoney -
2015 Poster: Asynchronous stochastic convex optimization: the noise is in the noise and SGD don't care »
Sorathan Chaturapruek · John Duchi · Christopher Ré -
2015 Poster: Rapidly Mixing Gibbs Sampling for a Class of Factor Graphs Using Hierarchy Width »
Christopher M De Sa · Ce Zhang · Kunle Olukotun · Christopher Ré -
2015 Spotlight: Rapidly Mixing Gibbs Sampling for a Class of Factor Graphs Using Hierarchy Width »
Christopher M De Sa · Ce Zhang · Kunle Olukotun · Christopher Ré · Christopher Ré -
2015 Poster: Taming the Wild: A Unified Analysis of Hogwild-Style Algorithms »
Christopher M De Sa · Ce Zhang · Kunle Olukotun · Christopher Ré · Christopher Ré -
2014 Workshop: 4th Workshop on Automated Knowledge Base Construction (AKBC) »
Sameer Singh · Fabian M Suchanek · Sebastian Riedel · Partha Pratim Talukdar · Kevin Murphy · Christopher Ré · William Cohen · Tom Mitchell · Andrew McCallum · Jason E Weston · Ramanathan Guha · Boyan Onyshkevych · Hoifung Poon · Oren Etzioni · Ari Kobren · Arvind Neelakantan · Peter Clark -
2014 Poster: Parallel Feature Selection Inspired by Group Testing »
Yingbo Zhou · Utkarsh Porwal · Ce Zhang · Hung Q Ngo · XuanLong Nguyen · Christopher Ré · Venu Govindaraju -
2013 Workshop: Big Learning : Advances in Algorithms and Data Management »
Xinghao Pan · Haijie Gu · Joseph Gonzalez · Sameer Singh · Yucheng Low · Joseph Hellerstein · Derek G Murray · Raghu Ramakrishnan · Michael Jordan · Christopher Ré