Timezone: »
Deep and wide neural networks successfully fit very complex functions today, but dense models are starting to be prohibitively expensive for inference. To mitigate this, one promising research direction is networks that activate a sparse subgraph of the network. The subgraph is chosen by a data-dependent routing function, enforcing a fixed mapping of inputs to subnetworks (e.g., the Mixture of Experts (MoE) paradigm in Switch Transformers). However, there is no theoretical grounding for these sparsely activated models. As our first contribution, we present a formal model of data-dependent sparse networks that captures salient aspects of popular architectures. Then, we show how to construct sparse networks that provably match the approximation power and total size of dense networks on Lipschitz functions. The sparse networks use much fewer inference operations than dense networks, leading to a faster forward pass. The key idea is to use locality sensitive hashing on the input vectors and then interpolate the function in subregions of the input space. This offers a theoretical insight into why sparse networks work well in practice. Finally, we present empirical findings that support our theory; compared to dense networks, sparse networks give a favorable trade-off between number of active units and approximation quality.
Author Information
Cenk Baykal (Google)
Nishanth Dikkala (Google)
Rina Panigrahy (Google)
Cyrus Rashtchian (Google Research)
Senior Research scientist at Google. I work on robustness, OOD generalization, and theoretical machine learning.
Xin Wang (Google)
More from the Same Authors
-
2022 Poster: Sketching based Representations for Robust Image Classification with Provable Guarantees »
Nishanth Dikkala · Sankeerth Rao Karingula · Raghu Meka · Jelani Nelson · Rina Panigrahy · Xin Wang -
2022 Poster: Weighted Distillation with Unlabeled Examples »
Fotis Iliopoulos · Vasilis Kontonis · Cenk Baykal · Gaurav Menghani · Khoa Trinh · Erik Vee -
2020 Poster: A Closer Look at Accuracy vs. Robustness »
Yao-Yuan Yang · Cyrus Rashtchian · Hongyang Zhang · Russ Salakhutdinov · Kamalika Chaudhuri