Timezone: »
GIST: Distributed Training for Large-Scale Graph Convolutional Networks
Cameron Wolfe · Jingkang Yang · Fangshuo Liao · Arindam Chowdhury · Chen Dun · Artun Bayer · Santiago Segarra · Anastasios Kyrillidis
Event URL: https://openreview.net/forum?id=-XsiFMpdSJz »
The graph convolutional network (GCN) is a go-to solution for machine learning on graphs, but its training is notoriously difficult to scale both in terms of graph size and the number of model parameters. Although some work has explored training on large-scale graphs, we pioneer efficient training of large-scale GCN models with the proposal of a novel, distributed training framework, called \texttt{GIST}. \texttt{GIST} disjointly partitions the parameters of a GCN model into several, smaller sub-GCNs that are trained independently and in parallel. Compatible with all GCN architectures and existing sampling techniques, \texttt{GIST} $i)$ improves model performance, $ii)$ scales to training on arbitrarily large graphs, $iii)$ decreases wall-clock training time, and $iv)$ enables the training of markedly overparameterized GCN models. Remarkably, with \texttt{GIST}, we train an astonishgly-wide $32,\!768$-dimensional GraphSAGE model, which exceeds the capacity of a single GPU by a factor of $8\times$, to SOTA performance on the Amazon2M dataset.
The graph convolutional network (GCN) is a go-to solution for machine learning on graphs, but its training is notoriously difficult to scale both in terms of graph size and the number of model parameters. Although some work has explored training on large-scale graphs, we pioneer efficient training of large-scale GCN models with the proposal of a novel, distributed training framework, called \texttt{GIST}. \texttt{GIST} disjointly partitions the parameters of a GCN model into several, smaller sub-GCNs that are trained independently and in parallel. Compatible with all GCN architectures and existing sampling techniques, \texttt{GIST} $i)$ improves model performance, $ii)$ scales to training on arbitrarily large graphs, $iii)$ decreases wall-clock training time, and $iv)$ enables the training of markedly overparameterized GCN models. Remarkably, with \texttt{GIST}, we train an astonishgly-wide $32,\!768$-dimensional GraphSAGE model, which exceeds the capacity of a single GPU by a factor of $8\times$, to SOTA performance on the Amazon2M dataset.
Author Information
Cameron Wolfe (Rice University)
Jingkang Yang (Nanyang Technological University)
Fangshuo Liao (Rice University)
Arindam Chowdhury (Rice University)
Chen Dun (Rice University)
Artun Bayer (Rice University)
Santiago Segarra (Rice University)
Anastasios Kyrillidis (Rice University)
More from the Same Authors
-
2021 : Acceleration and Stability of the Stochastic Proximal Point Algorithm »
Junhyung Lyle Kim · Panos Toulis · Anastasios Kyrillidis -
2021 : Acceleration and Stability of the Stochastic Proximal Point Algorithm »
Junhyung Lyle Kim · Panos Toulis · Anastasios Kyrillidis -
2022 : LOFT: Finding Lottery Tickets through Filter-wise Training »
Qihan Wang · Chen Dun · Fangshuo Liao · Christopher Jermaine · Anastasios Kyrillidis -
2022 : Strong Lottery Ticket Hypothesis with $\epsilon$–perturbation »
Fangshuo Liao · Zheyang Xiong · Anastasios Kyrillidis -
2022 : Strong Lottery Ticket Hypothesis with $\epsilon$–perturbation »
Fangshuo Liao · Zheyang Xiong · Anastasios Kyrillidis -
2022 : Efficient and Light-Weight Federated Learning via Asynchronous Distributed Dropout »
Chen Dun · Mirian Hipolito Garcia · Dimitrios Dimitriadis · Christopher Jermaine · Anastasios Kyrillidis -
2022 : Sparse Mixture-of-Experts are Domain Generalizable Learners »
Bo Li · Yifei Shen · Jingkang Yang · Yezhen Wang · Jiawei Ren · Tong Che · Jun Zhang · Ziwei Liu -
2023 : FedJETs: Efficient Just-In-Time Personalization with Federated Mixture of Experts »
Chen Dun · Mirian Hipolito Garcia · Guoqing Zheng · Ahmed Awadallah · Robert Sim · Anastasios Kyrillidis · Dimitrios Dimitriadis -
2023 : SAD: Segment Any RGBD »
Jun CEN · Yizheng Wu · Kewei Wang · Xingyi Li · Jingkang Yang · Yixuan Pei · Lingdong Kong · Ziwei Liu · Qifeng Chen -
2023 : Sweeping Heterogeneity with Smart MoPs: Mixture of Prompts for LLM Task Adaptation »
Chen Dun · Mirian Hipolito Garcia · Guoqing Zheng · Ahmed Awadallah · Anastasios Kyrillidis · Robert Sim -
2023 : CrysFormer: Protein Crystallography Prediction via 3d Patterson Maps and Partial Structure Attention »
Chen Dun · Tom Pan · Shikai Jin · Ria Stevens · Mitchell D. Miller · George Phillips · Anastasios Kyrillidis -
2023 : OpenOOD v1.5: Enhanced Benchmark for Out-of-Distribution Detection »
Jingyang Zhang · Jingkang Yang · Pengyun Wang · Haoqi Wang · Yueqian Lin · Haoran Zhang · Yiyou Sun · Xuefeng Du · Kaiyang Zhou · Wayne Zhang · Yixuan Li · Ziwei Liu · Yiran Chen · Hai Li -
2023 Poster: Scissorhands: Exploiting the Persistence of Importance Hypothesis for LLM KV Cache Compression at Test Time »
Zichang Liu · Aditya Desai · Fangshuo Liao · Weitao Wang · Victor Xie · Zhaozhuo Xu · Anastasios Kyrillidis · Anshumali Shrivastava -
2023 Poster: 4D Panoptic Scene Graph Generation »
Jingkang Yang · Jun CEN · WENXUAN PENG · Shuai Liu · Fangzhou Hong · Xiangtai Li · Kaiyang Zhou · Qifeng Chen · Ziwei Liu -
2023 Poster: Large Language Models are Visual Reasoning Coordinators »
Liangyu Chen · Bo Li · Sheng Shen · Jingkang Yang · Chunyuan Li · Kurt Keutzer · Trevor Darrell · Ziwei Liu -
2022 : Poster Session 2 »
Jinwuk Seok · Bo Liu · Ryotaro Mitsuboshi · David Martinez-Rubio · Weiqiang Zheng · Ilgee Hong · Chen Fan · Kazusato Oko · Bo Tang · Miao Cheng · Aaron Defazio · Tim G. J. Rudner · Gabriele Farina · Vishwak Srinivasan · Ruichen Jiang · Peng Wang · Jane Lee · Nathan Wycoff · Nikhil Ghosh · Yinbin Han · David Mueller · Liu Yang · Amrutha Varshini Ramesh · Siqi Zhang · Kaifeng Lyu · David Yunis · Kumar Kshitij Patel · Fangshuo Liao · Dmitrii Avdiukhin · Xiang Li · Sattar Vakili · Jiaxin Shi -
2022 : Contributed Talks 3 »
Cristóbal Guzmán · Fangshuo Liao · Vishwak Srinivasan · Zhiyuan Li -
2022 Poster: Graph Reordering for Cache-Efficient Near Neighbor Search »
Benjamin Coleman · Santiago Segarra · Alexander Smola · Anshumali Shrivastava -
2022 Poster: OpenOOD: Benchmarking Generalized Out-of-Distribution Detection »
Jingkang Yang · Pengyun Wang · Dejian Zou · Zitang Zhou · Kunyuan Ding · WENXUAN PENG · Haoqi Wang · Guangyao Chen · Bo Li · Yiyou Sun · Xuefeng Du · Kaiyang Zhou · Wayne Zhang · Dan Hendrycks · Yixuan Li · Ziwei Liu -
2019 : Final remarks »
Anastasios Kyrillidis · Albert Berahas · Fred Roosta · Michael Mahoney -
2019 Workshop: Beyond first order methods in machine learning systems »
Anastasios Kyrillidis · Albert Berahas · Fred Roosta · Michael Mahoney -
2019 : Opening Remarks »
Anastasios Kyrillidis · Albert Berahas · Fred Roosta · Michael Mahoney -
2019 Poster: Learning Sparse Distributions using Iterative Hard Thresholding »
Jacky Zhang · Rajiv Khanna · Anastasios Kyrillidis · Sanmi Koyejo