Timezone: »
Even pruned by the state-of-the-art network compression methods, recent research shows that deep learning model training still suffers from the demand of massive data usage. In particular, Graph Neural Networks (GNNs) training upon such non-Euclidean graph data often encounters relatively higher time costs, due to its irregular and nasty density properties, compared with data in the regular Euclidean space (e.g., image or text). Another natural property concomitantly with graph is class-imbalance which cannot be alleviated by the massive graph data while hindering GNNs' generalization. To fully tackle these unpleasant properties, (i) theoretically, we introduce a hypothesis about what extent a subset of the training data can approximate the full dataset's learning effectiveness. The effectiveness is further guaranteed and proved by the gradients' distance between the subset and the full set; (ii) empirically, we discover that during the learning process of a GNN, some samples in the training dataset are informative for providing gradients to update model parameters. Moreover, the informative subset is not fixed during training process. Samples that are informative in the current training epoch may not be so in the next one. We refer to this observation as dynamic data sparsity. We also notice that sparse subnets pruned from a well-trained GNN sometimes forget the information provided by the informative subset, reflected in their poor performances upon the subset. Based on these findings, we develop a unified data-model dynamic sparsity framework named Graph Decantation (GraphDec) to address challenges brought by training upon a massive class-imbalanced graph data. The key idea of GraphDec is to identify the informative subset dynamically during the training process by adopting sparse graph contrastive learning. Extensive experiments on multiple benchmark datasets demonstrate that GraphDec outperforms state-of-the-art baselines for class-imbalanced graph classification and class-imbalanced node classification tasks, with respect to classification accuracy and data usage efficiency.
Author Information
Chunhui Zhang (Brandeis University)
Chao Huang (University of Hong Kong)
Yijun Tian (University of Notre Dame)
Qianlong Wen (University of Notre Dame)
Zhongyu Ouyang (University of Notre Dame)
Youhuan Li (Hunan University)
Yanfang Ye (University of Notre Dame)
Chuxu Zhang (Brandeis University)
More from the Same Authors
-
2022 Poster: Label-invariant Augmentation for Semi-Supervised Graph Classification »
Han Yue · Chunhui Zhang · Chuxu Zhang · Hongfu Liu -
2022 : Graph Contrastive Learning with Cross-view Reconstruction »
Qianlong Wen · Zhongyu Ouyang · Chunhui Zhang · Yiyue Qian · Yanfang Ye · Chuxu Zhang -
2022 : Contrastive Graph Few-Shot Learning »
Chunhui Zhang · Hongfu Liu · Jundong Li · Yanfang Ye · Chuxu Zhang -
2022 : NOSMOG: Learning Noise-robust and Structure-aware MLPs on Graphs »
Yijun Tian · Chuxu Zhang · Zhichun Guo · Xiangliang Zhang · Nitesh Chawla -
2023 Poster: GraphPatcher: Mitigating Degree Bias for Graph Neural Networks via Test-time Node Patching »
Mingxuan Ju · Tong Zhao · Wenhao Yu · Neil Shah · Yanfang Ye -
2023 Poster: Generative Pre-Training of Spatio-Temporal Graph Neural Networks »
Zhonghang Li · Lianghao Xia · Yong Xu · Chao Huang -
2023 Poster: LargeST: A Benchmark Dataset for Large-Scale Traffic Forecasting »
Xu Liu · Yutong Xia · Yuxuan Liang · Junfeng Hu · Yiwei Wang · LEI BAI · Chao Huang · Zhenguang Liu · Bryan Hooi · Roger Zimmermann -
2022 Poster: Multi-objective Deep Data Generation with Correlated Property Control »
Shiyu Wang · Xiaojie Guo · Xuanyang Lin · Bo Pan · Yuanqi Du · Yinkai Wang · Yanfang Ye · Ashley Petersen · Austin Leitgeb · Saleh Alkhalifa · Kevin Minbiole · William M. Wuest · Amarda Shehu · Liang Zhao -
2022 Poster: Co-Modality Graph Contrastive Learning for Imbalanced Node Classification »
Yiyue Qian · Chunhui Zhang · Yiming Zhang · Qianlong Wen · Yanfang Ye · Chuxu Zhang