Track: Orals & Spotlights Track 26: Graph/Relational/Theory

Thu 10 Dec. 6:00 - 6:15 PST

Oral

Graph Cross Networks with Vertex Infomax Pooling

Maosen Li · Siheng Chen · Ya Zhang · Ivor Tsang

We propose a novel graph cross network (GXN) to achieve comprehensive feature learning from multiple scales of a graph. Based on trainable hierarchical representations of a graph, GXN enables the interchange of intermediate features across scales to promote information flow. Two key ingredients of GXN include a novel vertex infomax pooling (VIPool), which creates multiscale graphs in a trainable manner, and a novel feature-crossing layer, enabling feature interchange across scales. The proposed VIPool selects the most informative subset of vertices based on the neural estimation of mutual information between vertex features and neighborhood features. The intuition behind is that a vertex is informative when it can maximally reflect its neighboring information. The proposed feature-crossing layer fuses intermediate features between two scales for mutual enhancement by improving information flow and enriching multiscale features at hidden layers. The cross shape of feature-crossing layer distinguishes GXN from many other multiscale architectures. Experimental results show that the proposed GXN improves the classification accuracy by 2.12% and 1.15% on average for graph classification and vertex classification, respectively. Based on the same network, the proposed VIPool consistently outperforms other graph-pooling methods.

Thu 10 Dec. 6:15 - 6:30 PST

Oral

Erdos Goes Neural: an Unsupervised Learning Framework for Combinatorial Optimization on Graphs

Nikolaos Karalias · Andreas Loukas

Combinatorial optimization (CO) problems are notoriously challenging for neural networks, especially in the absence of labeled instances. This work proposes an unsupervised learning framework for CO problems on graphs that can provide integral solutions of certified quality. Inspired by Erdos' probabilistic method, we use a neural network to parametrize a probability distribution over sets. Crucially, we show that when the network is optimized w.r.t. a suitably chosen loss, the learned distribution contains, with controlled probability, a low-cost integral solution that obeys the constraints of the combinatorial problem. The probabilistic proof of existence is then derandomized to decode the desired solutions. We demonstrate the efficacy of this approach to obtain valid
solutions to the maximum clique problem and to perform local graph clustering. Our method achieves competitive results on both real datasets and synthetic hard instances.

Thu 10 Dec. 6:30 - 6:45 PST

Oral

Graph Random Neural Networks for Semi-Supervised Learning on Graphs

Wenzheng Feng · Jie Zhang · Yuxiao Dong · Yu Han · Huanbo Luan · Qian Xu · Qiang Yang · Evgeny Kharlamov · Jie Tang

We study the problem of semi-supervised learning on graphs, for which graph neural networks (GNNs) have been extensively explored. However, most existing GNNs inherently suffer from the limitations of over-smoothing, non-robustness, and weak-generalization when labeled nodes are scarce. In this paper, we propose a simple yet effective framework—GRAPH RANDOM NEURAL NETWORKS (GRAND)—to address these issues. In GRAND, we first design a random propagation strategy to perform graph data augmentation. Then we leverage consistency regularization to optimize the prediction consistency of unlabeled nodes across different data augmentations. Extensive experiments on graph benchmark datasets suggest that GRAND significantly outperforms state-of- the-art GNN baselines on semi-supervised node classification. Finally, we show that GRAND mitigates the issues of over-smoothing and non-robustness, exhibiting better generalization behavior than existing GNNs. The source code of GRAND is publicly available at https://github.com/Grand20/grand.

Thu 10 Dec. 6:45 - 7:00 PST

Break

Thu 10 Dec. 7:00 - 7:10 PST

Spotlight

Learning Graph Structure With A Finite-State Automaton Layer

Daniel D. Johnson · Hugo Larochelle · Danny Tarlow

Graph-based neural network models are producing strong results in a number of domains, in part because graphs provide flexibility to encode domain knowledge in the form of relational structure (edges) between nodes in the graph. In practice, edges are used both to represent intrinsic structure (e.g., abstract syntax trees of programs) and more abstract relations that aid reasoning for a downstream task (e.g., results of relevant program analyses). In this work, we study the problem of learning to derive abstract relations from the intrinsic graph structure. Motivated by their power in program analyses, we consider relations defined by paths on the base graph accepted by a finite-state automaton. We show how to learn these relations end-to-end by relaxing the problem into learning finite-state automata policies on a graph-based POMDP and then training these policies using implicit differentiation. The result is a differentiable Graph Finite-State Automaton (GFSA) layer that adds a new edge type (expressed as a weighted adjacency matrix) to a base graph. We demonstrate that this layer can find shortcuts in grid-world graphs and reproduce simple static analyses on Python programs. Additionally, we combine the GFSA layer with a larger graph-based model trained end-to-end on the variable misuse program understanding task, and find that using the GFSA layer leads to better performance than using hand-engineered semantic edges or other baseline methods for adding learned edge types.

Thu 10 Dec. 7:10 - 7:20 PST

Spotlight

Pointer Graph Networks

Petar Veličković · Lars Buesing · Matthew Overlan · Razvan Pascanu · Oriol Vinyals · Charles Blundell

Graph neural networks (GNNs) are typically applied to static graphs that are assumed to be known upfront. This static input structure is often informed purely by insight of the machine learning practitioner, and might not be optimal for the actual task the GNN is solving. In absence of reliable domain expertise, one might resort to inferring the latent graph structure, which is often difficult due to the vast search space of possible graphs. Here we introduce Pointer Graph Networks (PGNs) which augment sets or graphs with additional inferred edges for improved model generalisation ability. PGNs allow each node to dynamically point to another node, followed by message passing over these pointers. The sparsity of this adaptable graph structure makes learning tractable while still being sufficiently expressive to simulate complex algorithms. Critically, the pointing mechanism is directly supervised to model long-term sequences of operations on classical data structures, incorporating useful structural inductive biases from theoretical computer science. Qualitatively, we demonstrate that PGNs can learn parallelisable variants of pointer-based data structures, namely disjoint set unions and link/cut trees. PGNs generalise out-of-distribution to 5x larger test inputs on dynamic graph connectivity tasks, outperforming unrestricted GNNs and Deep Sets.

Thu 10 Dec. 7:20 - 7:30 PST

Q&A

Joint Q&A for Preceeding Spotlights

Hongbin Pei · Bingzhe Wei · Kevin Chang · Chunxu Zhang · Bo Yang

Thu 10 Dec. 7:30 - 7:40 PST

Spotlight

Certified Robustness of Graph Convolution Networks for Graph Classification under Topological Attacks

Hongwei Jin · Zhan Shi · Venkata Jaya Shankar Ashish Peruri · Xinhua Zhang

Graph convolution networks (GCNs) have become effective models for graph classification. Similar to many deep networks, GCNs are vulnerable to adversarial attacks on graph topology and node attributes. Recently, a number of effective attack and defense algorithms have been developed, but certificates of robustness against \emph{topological perturbations} are currently available only for PageRank and label/feature propagation, while none has been designed for GCNs. We propose the first algorithm for certifying the robustness of GCNs to topological attacks in the application of \emph{graph classification}. Our method is based on Lagrange dualization and convex envelope, which result in tight approximation bounds that are efficiently computable by dynamic programming. When used in conjunction with robust training, it allows an increased number of graphs to be certified as robust.

Thu 10 Dec. 7:40 - 7:50 PST

Q&A

Joint Q&A for Preceeding Spotlights

Thu 10 Dec. 7:50 - 8:00 PST

Spotlight

Convergence and Stability of Graph Convolutional Networks on Large Random Graphs

Nicolas Keriven · Alberto Bietti · Samuel Vaiter

We study properties of Graph Convolutional Networks (GCNs) by analyzing their behavior on standard models of random graphs, where nodes are represented by random latent variables and edges are drawn according to a similarity kernel. This allows us to overcome the difficulties of dealing with discrete notions such as isomorphisms on very large graphs, by considering instead more natural geometric aspects. We first study the convergence of GCNs to their continuous counterpart as the number of nodes grows. Our results are fully non-asymptotic and are valid for relatively sparse graphs with an average degree that grows logarithmically with the number of nodes. We then analyze the stability of GCNs to small deformations of the random graph model. In contrast to previous studies of stability in discrete settings, our continuous setup allows us to provide more intuitive deformation-based metrics for understanding stability, which have proven useful for explaining the success of convolutional representations on Euclidean domains.

Thu 10 Dec. 8:00 - 8:10 PST

Spotlight

Fourier Features Let Networks Learn High Frequency Functions in Low Dimensional Domains

Matthew Tancik · Pratul Srinivasan · Ben Mildenhall · Sara Fridovich-Keil · Nithin Raghavan · Utkarsh Singhal · Ravi Ramamoorthi · Jonathan Barron · Ren Ng

We show that passing input points through a simple Fourier feature mapping enables a multilayer perceptron (MLP) to learn high-frequency functions in low-dimensional problem domains. These results shed light on recent advances in computer vision and graphics that achieve state-of-the-art results by using MLPs to represent complex 3D objects and scenes. Using tools from the neural tangent kernel (NTK) literature, we show that a standard MLP has impractically slow convergence to high frequency signal components. To overcome this spectral bias, we use a Fourier feature mapping to transform the effective NTK into a stationary kernel with a tunable bandwidth. We suggest an approach for selecting problem-specific Fourier features that greatly improves the performance of MLPs for low-dimensional regression tasks relevant to the computer vision and graphics communities.

Thu 10 Dec. 8:10 - 8:20 PST

Spotlight

Most ReLU Networks Suffer from $\ell^2$ Adversarial Perturbations

Amit Daniely · Hadas Shacham

We consider ReLU networks with random weights, in which the dimension decreases at each layer. We show that for most such networks, most examples $x$ admit an adversarial perturbation at an Euclidean distance of $O\left(\frac{\|x\|}{\sqrt{d}}\right)$, where $d$ is the input dimension. Moreover, this perturbation can be found via gradient flow, as well as gradient descent with sufficiently small steps. This result can be seen as an explanation to the abundance of adversarial examples, and to the fact that they are found via gradient descent.

Thu 10 Dec. 8:20 - 8:30 PST

Spotlight

Beyond Perturbations: Learning Guarantees with Arbitrary Adversarial Test Examples

Shafi Goldwasser · Adam Tauman Kalai · Yael Kalai · Omar Montasser

We present a transductive learning algorithm that takes as input training examples from a distribution P and arbitrary (unlabeled) test examples, possibly chosen by an adversary. This is unlike prior work that assumes that test examples are small perturbations of P. Our algorithm outputs a selective classifier, which abstains from predicting on some examples. By considering selective transductive learning, we give the first nontrivial guarantees for learning classes of bounded VC dimension with arbitrary train and test distributions—no prior guarantees were known even for simple classes of functions such as intervals on the line. In particular, for any function in a class C of bounded VC dimension, we guarantee a low test error rate and a low rejection rate with respect to P. Our algorithm is efficient given an Empirical Risk Minimizer (ERM) for C. Our guarantees hold even for test examples chosen by an unbounded white-box adversary. We also give guarantees for generalization, agnostic, and unsupervised settings.

Thu 10 Dec. 8:30 - 8:40 PST

Q&A

Joint Q&A for Preceeding Spotlights

Thu 10 Dec. 8:40 - 9:00 PST

Break

Main Navigation

Session

Orals & Spotlights Track 26: Graph/Relational/Theory

Joan Bruna · Cassio de Campos

Graph Cross Networks with Vertex Infomax Pooling

Erdos Goes Neural: an Unsupervised Learning Framework for Combinatorial Optimization on Graphs

Graph Random Neural Networks for Semi-Supervised Learning on Graphs

Break

Learning Graph Structure With A Finite-State Automaton Layer

Pointer Graph Networks

Joint Q&A for Preceeding Spotlights

Certified Robustness of Graph Convolution Networks for Graph Classification under Topological Attacks

Joint Q&A for Preceeding Spotlights

Convergence and Stability of Graph Convolutional Networks on Large Random Graphs

Fourier Features Let Networks Learn High Frequency Functions in Low Dimensional Domains

Most ReLU Networks Suffer from $\ell^2$ Adversarial Perturbations

Beyond Perturbations: Learning Guarantees with Arbitrary Adversarial Test Examples

Joint Q&A for Preceeding Spotlights

Break