Timezone: »
We present Spartan, a method for training sparse neural network models with a predetermined level of sparsity. Spartan is based on a combination of two techniques: (1) soft top-k masking of low-magnitude parameters via a regularized optimal transportation problem and (2) dual averaging-based parameter updates with hard sparsification in the forward pass. This scheme realizes an exploration-exploitation tradeoff: early in training, the learner is able to explore various sparsity patterns, and as the soft top-k approximation is gradually sharpened over the course of training, the balance shifts towards parameter optimization with respect to a fixed sparsity mask. Spartan is sufficiently flexible to accommodate a variety of sparsity allocation policies, including both unstructured and block-structured sparsity, global and per-layer sparsity budgets, as well as general cost-sensitive sparsity allocation mediated by linear models of per-parameter costs. On ImageNet-1K classification, we demonstrate that training with Spartan yields 95% sparse ResNet-50 models and 90% block sparse ViT-B/16 models while incurring absolute top-1 accuracy losses of less than 1% compared to fully dense training.
Author Information
Kai Sheng Tai (Stanford University)
Taipeng Tian (Facebook)
Ser Nam Lim (Facebook AI)
More from the Same Authors
-
2021 : Mix-MaxEnt: Improving Accuracy and Uncertainty Estimates of Deterministic Neural Networks »
Francesco Pinto · Harry Yang · Ser Nam Lim · Philip Torr · Puneet Dokania -
2023 Poster: Riemannian Residual Neural Networks »
Isay Katsman · Eric M Chen · Sidhanth Holalkere · Anna Asch · Aaron Lou · Ser Nam Lim · Christopher De Sa -
2023 Poster: Test-Time Distribution Normalization for Contrastively Learned Visual-language Models »
Yifei Zhou · Juntao Ren · Fengyu Li · Ramin Zabih · Ser Nam Lim -
2023 Poster: Video Dynamics Prior: An Internal Learning Approach for Robust Video Enhancements »
Gaurav Shrivastava · Ser Nam Lim · Abhinav Shrivastava -
2022 Poster: Using Mixup as a Regularizer Can Surprisingly Improve Accuracy & Out-of-Distribution Robustness »
Francesco Pinto · Harry Yang · Ser Nam Lim · Philip Torr · Puneet Dokania -
2022 Poster: FedSR: A Simple and Effective Domain Generalization Method for Federated Learning »
A. Tuan Nguyen · Philip Torr · Ser Nam Lim -
2022 Poster: GAPX: Generalized Autoregressive Paraphrase-Identification X »
Yifei Zhou · Renyu Li · Hayden Housen · Ser Nam Lim -
2022 Poster: Few-Shot Fast-Adaptive Anomaly Detection »
Ze Wang · Yipin Zhou · Rui Wang · Tsung-Yu Lin · Ashish Shah · Ser Nam Lim -
2022 Poster: HorNet: Efficient High-Order Spatial Interactions with Recursive Gated Convolutions »
Yongming Rao · Wenliang Zhao · Yansong Tang · Jie Zhou · Ser Nam Lim · Jiwen Lu -
2021 Poster: Learning to Ground Multi-Agent Communication with Autoencoders »
Toru Lin · Jacob Huh · Christopher Stauffer · Ser Nam Lim · Phillip Isola -
2021 Poster: Large Scale Learning on Non-Homophilous Graphs: New Benchmarks and Strong Simple Methods »
Derek Lim · Felix Hohne · Xiuyu Li · Sijia Linda Huang · Vaishnavi Gupta · Omkar Bhalerao · Ser Nam Lim -
2021 Poster: NeRV: Neural Representations for Videos »
Hao Chen · Bo He · Hanyu Wang · Yixuan Ren · Ser Nam Lim · Abhinav Shrivastava -
2021 Poster: Equivariant Manifold Flows »
Isay Katsman · Aaron Lou · Derek Lim · Qingxuan Jiang · Ser Nam Lim · Christopher De Sa -
2021 Poster: A Continuous Mapping For Augmentation Design »
Keyu Tian · Chen Lin · Ser Nam Lim · Wanli Ouyang · Puneet Dokania · Philip Torr -
2020 Poster: Better Set Representations For Relational Reasoning »
Qian Huang · Horace He · Abhay Singh · Yan Zhang · Ser Nam Lim · Austin Benson -
2020 Poster: Neural Manifold Ordinary Differential Equations »
Aaron Lou · Derek Lim · Isay Katsman · Leo Huang · Qingxuan Jiang · Ser Nam Lim · Christopher De Sa