Timezone: »

A Statistical Mechanics Framework for Task-Agnostic Sample Design in Machine Learning
Bhavya Kailkhura · Jayaraman Thiagarajan · Qunwei Li · Jize Zhang · Yi Zhou · Timo Bremer

Thu Dec 10 09:00 AM -- 11:00 AM (PST) @ Poster Session 5 #1439

In this paper, we present a statistical mechanics framework to understand the effect of sampling properties of training data on the generalization gap of machine learning (ML) algorithms. We connect the generalization gap to the spatial properties of a sample design characterized by the pair correlation function (PCF). In particular, we express generalization gap in terms of the power spectra of the sample design and that of the function to be learned. Using this framework, we show that space-filling sample designs, such as blue noise and Poisson disk sampling, which optimize spectral properties, outperform random designs in terms of the generalization gap and characterize this gain in a closed-form. Our analysis also sheds light on design principles for constructing optimal task-agnostic sample designs that minimize the generalization gap. We corroborate our findings using regression experiments with neural networks on: a) synthetic functions, and b) a complex scientific simulator for inertial confinement fusion (ICF).

Author Information

Bhavya Kailkhura (Lawrence Livermore National Laboratory)
Jayaraman Thiagarajan (Lawrence Livermore National Labs)
Qunwei Li (Ant Financial)
Jize Zhang (Lawrence Livermore National Laboratory)
Yi Zhou (University of Utah)
Timo Bremer (Lawrence Livermore National Laboratory)

More from the Same Authors