NeurIPS Poster Efficient Learning of Linear Graph Neural Networks via Node Subsampling

Poster

Efficient Learning of Linear Graph Neural Networks via Node Subsampling

Seiyun Shin · Ilan Shomorony · Han Zhao

Great Hall & Hall B1+B2 (level 1) #629

[ Abstract ] [ Project Page ]

[ Paper] [ Poster] [ OpenReview]

Abstract: Graph Neural Networks (GNNs) are a powerful class of machine learning models with applications in recommender systems, drug discovery, social network analysis, and computer vision. One challenge with their implementation is that GNNs often take large-scale graphs as inputs, which imposes significant computational/storage costs in the training and testing phases. In particular, the message passing operations of a GNN require multiplication of the graph adjacency matrix

A \in R^{n \times n}

$A \in \mathbb{R}^{n \times n}$ and the data matrix

X \in R^{n \times d}

$X \in \mathbb{R}^{n \times d}$ , and the

O (n^{2} d)

$O(n^2 d)$ time complexity can be prohibitive for large

n

$n$ . Thus, a natural question is whether it is possible to perform the GNN operations in (quasi-)linear time by avoiding the full computation of

A X

$A X$ . To study this question, we consider the setting of a regression task on a two-layer Linear Graph Convolutional Network (GCN). We develop an efficient training algorithm based on (1) performing node subsampling, (2) estimating the leverage scores of

A X

$A X$ based on the subsampled graph, and (3) performing leverage score sampling on

A X

$A X$ . We show that our proposed scheme learns the regression model observing only

O (n d ϵ^{- 2} \log n)

$O(nd\epsilon^{-2}\log n)$ entries of

A

$A$ in time

O (n d^{2} ϵ^{- 2} \log n)

$O(nd^2 \epsilon^{-2}\log n)$ , with the guarantee that the learned weights deviate by at most

ϵ

$\epsilon$ under the

ℓ_{2}

$\ell_2$ norm from the model learned using the entire adjacency matrix

A

$A$ . We present empirical results for regression problems on real-world graphs and show that our algorithm significantly outperforms other baseline sampling strategies that exploit the same number of observations.

Chat is not available.