Keywords: [ Dataset Condensation ] [ Graph Condensation ] [ graph neural networks ] [ hyperparameter optimization ] [ Architecture Search ] [ Graph Compression ]
Dataset condensation aims to reduce the computational cost of training multiple models on a large dataset by condensing the training set into a small synthetic one. State-of-the-art approaches rely on matching the gradients between the real and synthetic data and are recently applied to condense large-scale graphs for node classification tasks. Although dataset condensation may be efficient when we need to train multiple models for hyperparameter optimization, there is no theoretical guarantee on the generalizability of the condensed data, and it can generalize poorly across hyperparameters/architectures in practice; while on graphs, we find and prove this overfitting is much more severe. This paper considers a different condensation objective specifically for hyperparameter search. We aim to generate the synthetic dataset so that the validation-performance ranking of different models under different hyperparameters on the condensed and original datasets are comparable. We propose a novel hyperparameter-calibrated dataset condensation (HCDC) algorithm, which learns the synthetic validation data by matching the hyperparameter gradients computed by implicit differentiation and efficient inverse Hessian approximation. HCDC employs a supernet with differentiable hyperparameters, making it suitable for modeling GNNs with widely different convolution filters. Experiments demonstrate that the proposed framework effectively maintains the validation-performance rankings of GNNs and speeds up hyperparameter/architecture search on graphs.