Timezone: »
Distributed implementations of mini-batch stochastic gradient descent (SGD) suffer from communication overheads, attributed to the high frequency of gradient updates inherent in small-batch training. Training with large batches can reduce these overheads; however it besets the convergence of the algorithm and the generalization performance.
In this work, we take a first step towards analyzing how the structure (width and depth) of a neural network affects the performance of large-batch training. We present new theoretical results which suggest that--for a fixed number of parameters--wider networks are more amenable to fast large-batch training compared to deeper ones. We provide extensive experiments on residual and fully-connected neural networks which suggest that wider networks can be trained using larger batches without incurring a convergence slow-down, unlike their deeper variants.
Author Information
Lingjiao Chen (University of Wisconsin-Madison)
Hongyi Wang (University of Wisconsin-Madison)
Jinman Zhao (University of Wisconsin-Madison)
Dimitris Papailiopoulos (University of Wisconsin-Madison)
Paraschos Koutris (University of Wisconsin-Madison)
More from the Same Authors
-
2022 : Active Learning is a Strong Baseline for Data Subset Selection »
Dongmin Park · Dimitris Papailiopoulos · Kangwook Lee -
2022 : A Better Way to Decay: Proximal Gradient Training Algorithms for Neural Nets »
Liu Yang · Jifan Zhang · Joseph Shenouda · Dimitris Papailiopoulos · Kangwook Lee · Robert Nowak -
2022 Poster: LIFT: Language-Interfaced Fine-Tuning for Non-language Machine Learning Tasks »
Tuan Dinh · Yuchen Zeng · Ruisu Zhang · Ziqian Lin · Michael Gira · Shashank Rajput · Jy-yong Sohn · Dimitris Papailiopoulos · Kangwook Lee -
2022 Poster: Rare Gems: Finding Lottery Tickets at Initialization »
Kartik Sreenivasan · Jy-yong Sohn · Liu Yang · Matthew Grinde · Alliot Nagle · Hongyi Wang · Eric Xing · Kangwook Lee · Dimitris Papailiopoulos -
2021 Poster: An Exponential Improvement on the Memorization Capacity of Deep Threshold Networks »
Shashank Rajput · Kartik Sreenivasan · Dimitris Papailiopoulos · Amin Karbasi -
2020 Poster: Bad Global Minima Exist and SGD Can Reach Them »
Shengchao Liu · Dimitris Papailiopoulos · Dimitris Achlioptas -
2020 Poster: FrugalML: How to use ML Prediction APIs more accurately and cheaply »
Lingjiao Chen · Matei Zaharia · James Zou -
2020 Oral: FrugalML: How to use ML Prediction APIs more accurately and cheaply »
Lingjiao Chen · Matei Zaharia · James Zou -
2020 Poster: Attack of the Tails: Yes, You Really Can Backdoor Federated Learning »
Hongyi Wang · Kartik Sreenivasan · Shashank Rajput · Harit Vishwakarma · Saurabh Agarwal · Jy-yong Sohn · Kangwook Lee · Dimitris Papailiopoulos -
2020 Poster: Optimal Lottery Tickets via Subset Sum: Logarithmic Over-Parameterization is Sufficient »
Ankit Pensia · Shashank Rajput · Alliot Nagle · Harit Vishwakarma · Dimitris Papailiopoulos -
2020 Spotlight: Optimal Lottery Tickets via Subset Sum: Logarithmic Over-Parameterization is Sufficient »
Ankit Pensia · Shashank Rajput · Alliot Nagle · Harit Vishwakarma · Dimitris Papailiopoulos -
2019 Poster: Scalable inference of topic evolution via models for latent geometric structures »
Mikhail Yurochkin · Zhiwei Fan · Aritra Guha · Paraschos Koutris · XuanLong Nguyen -
2019 Poster: DETOX: A Redundancy-based Framework for Faster and More Robust Gradient Aggregation »
Shashank Rajput · Hongyi Wang · Zachary Charles · Dimitris Papailiopoulos -
2018 Poster: ATOMO: Communication-efficient Learning via Atomic Sparsification »
Hongyi Wang · Scott Sievert · Shengchao Liu · Zachary Charles · Dimitris Papailiopoulos · Stephen Wright -
2016 Poster: Cyclades: Conflict-free Asynchronous Machine Learning »
Xinghao Pan · Maximilian Lam · Stephen Tu · Dimitris Papailiopoulos · Ce Zhang · Michael Jordan · Kannan Ramchandran · Christopher RĂ© · Benjamin Recht -
2015 Poster: Orthogonal NMF through Subspace Exploration »
Megasthenis Asteris · Dimitris Papailiopoulos · Alex Dimakis -
2015 Poster: Sparse PCA via Bipartite Matchings »
Megasthenis Asteris · Dimitris Papailiopoulos · Anastasios Kyrillidis · Alex Dimakis -
2015 Poster: Parallel Correlation Clustering on Big Graphs »
Xinghao Pan · Dimitris Papailiopoulos · Samet Oymak · Benjamin Recht · Kannan Ramchandran · Michael Jordan