Timezone: »
Poster
SMYRF - Efficient Attention using Asymmetric Clustering
Giannis Daras · Nikita Kitaev · Augustus Odena · Alexandros Dimakis
We propose a novel type of balanced clustering algorithm to approximate attention. Attention complexity is reduced from $O(N^2)$ to $O(N \log N)$, where N is the sequence length. Our algorithm, SMYRF, uses Locality Sensitive Hashing (LSH) in a novel way by defining new Asymmetric transformations and an adaptive scheme that produces balanced clusters. The biggest advantage of SMYRF is that it can be used as a drop-in replacement for dense attention layers without any retraining. On the contrary, prior fast attention methods impose constraints (e.g. tight queries and keys) and require re-training from scratch. We apply our method to pre-trained state-of-the-art Natural Language Processing and Computer Vision models and we report significant memory and speed benefits. Notably, SMYRF-BERT outperforms (slightly) BERT on GLUE, while using 50% less memory. We also show that SMYRF can be used interchangeably with dense attention before and after training. Finally, we use SMYRF to train GANs with attention in high resolutions. Using a single TPU, we train BigGAN on Celeba-HQ, with attention at resolution 128x128 and 256x256, capable of generating realistic human faces.
Author Information
Giannis Daras (The University of Texas at Austin)
Nikita Kitaev (University of California, Berkeley)
Augustus Odena (Google Brain)
Alex Dimakis (University of Texas, Austin)
More from the Same Authors
-
2020 Workshop: Workshop on Computer Assisted Programming (CAP) »
Augustus Odena · Charles Sutton · Nadia Polikarpova · Josh Tenenbaum · Armando Solar-Lezama · Isil Dillig -
2020 Poster: Top-k Training of GANs: Improving GAN Performance by Throwing Away Bad Samples »
Samarth Sinha · Zhengli Zhao · Anirudh Goyal ALIAS PARTH GOYAL · Colin A Raffel · Augustus Odena -
2020 Poster: Applications of Common Entropy for Causal Inference »
Murat Kocaoglu · Sanjay Shakkottai · Alexandros Dimakis · Constantine Caramanis · Sriram Vishwanath -
2020 Poster: Exactly Computing the Local Lipschitz Constant of ReLU Networks »
Matt Jordan · Alexandros Dimakis -
2020 Poster: Robust compressed sensing using generative models »
Ajil Jalal · Liu Liu · Alexandros Dimakis · Constantine Caramanis -
2019 Workshop: Information Theory and Machine Learning »
Shengjia Zhao · Jiaming Song · Yanjun Han · Kristy Choi · Pratyusha Kalluri · Ben Poole · Alexandros Dimakis · Jiantao Jiao · Tsachy Weissman · Stefano Ermon -
2019 Workshop: Solving inverse problems with deep networks: New architectures, theoretical foundations, and applications »
Reinhard Heckel · Paul Hand · Richard Baraniuk · Joan Bruna · Alexandros Dimakis · Deanna Needell -
2019 Poster: Inverting Deep Generative models, One layer at a time »
Qi Lei · Ajil Jalal · Inderjit Dhillon · Alexandros Dimakis -
2019 Poster: Provable Certificates for Adversarial Examples: Fitting a Ball in the Union of Polytopes »
Matt Jordan · Justin Lewis · Alexandros Dimakis -
2019 Poster: Primal-Dual Block Generalized Frank-Wolfe »
Qi Lei · JIACHENG ZHUO · Constantine Caramanis · Inderjit Dhillon · Alexandros Dimakis -
2019 Poster: Sparse Logistic Regression Learns All Discrete Pairwise Graphical Models »
Shanshan Wu · Sujay Sanghavi · Alexandros Dimakis -
2019 Spotlight: Sparse Logistic Regression Learns All Discrete Pairwise Graphical Models »
Shanshan Wu · Sujay Sanghavi · Alexandros Dimakis -
2019 Poster: Learning Distributions Generated by One-Layer ReLU Networks »
Shanshan Wu · Alexandros Dimakis · Sujay Sanghavi -
2018 Poster: Realistic Evaluation of Deep Semi-Supervised Learning Algorithms »
Avital Oliver · Augustus Odena · Colin A Raffel · Ekin Dogus Cubuk · Ian Goodfellow -
2018 Poster: Experimental Design for Cost-Aware Learning of Causal Graphs »
Erik Lindgren · Murat Kocaoglu · Alexandros Dimakis · Sriram Vishwanath -
2018 Spotlight: Realistic Evaluation of Deep Semi-Supervised Learning Algorithms »
Avital Oliver · Augustus Odena · Colin A Raffel · Ekin Dogus Cubuk · Ian Goodfellow -
2017 Workshop: NIPS Highlights (MLTrain), Learn How to code a paper with state of the art frameworks »
Alexandros Dimakis · Nikolaos Vasiloglou · Guy Van den Broeck · Alexander Ihler · Assaf Araki -
2017 Poster: Streaming Weak Submodularity: Interpreting Neural Networks on the Fly »
Ethan Elenberg · Alexandros Dimakis · Moran Feldman · Amin Karbasi -
2017 Oral: Streaming Weak Submodularity: Interpreting Neural Networks on the Fly »
Ethan Elenberg · Alexandros Dimakis · Moran Feldman · Amin Karbasi -
2017 Poster: Model-Powered Conditional Independence Test »
Rajat Sen · Ananda Theertha Suresh · Karthikeyan Shanmugam · Alexandros Dimakis · Sanjay Shakkottai -
2016 Poster: Leveraging Sparsity for Efficient Submodular Data Summarization »
Erik Lindgren · Shanshan Wu · Alexandros Dimakis -
2016 Poster: Single Pass PCA of Matrix Products »
Shanshan Wu · Srinadh Bhojanapalli · Sujay Sanghavi · Alexandros Dimakis -
2015 Poster: Orthogonal NMF through Subspace Exploration »
Megasthenis Asteris · Dimitris Papailiopoulos · Alexandros Dimakis -
2015 Poster: Sparse PCA via Bipartite Matchings »
Megasthenis Asteris · Dimitris Papailiopoulos · Anastasios Kyrillidis · Alexandros Dimakis -
2015 Poster: Learning Causal Graphs with Small Interventions »
Karthikeyan Shanmugam · Murat Kocaoglu · Alexandros Dimakis · Sriram Vishwanath -
2014 Poster: Sparse Polynomial Learning and Graph Sketching »
Murat Kocaoglu · Karthikeyan Shanmugam · Alexandros Dimakis · Adam Klivans -
2014 Poster: On the Information Theoretic Limits of Learning Ising Models »
Rashish Tandon · Karthikeyan Shanmugam · Pradeep Ravikumar · Alexandros Dimakis -
2014 Oral: Sparse Polynomial Learning and Graph Sketching »
Murat Kocaoglu · Karthikeyan Shanmugam · Alexandros Dimakis · Adam Klivans