Timezone: »
In this paper, we present a statistical mechanics framework to understand the effect of sampling properties of training data on the generalization gap of machine learning (ML) algorithms. We connect the generalization gap to the spatial properties of a sample design characterized by the pair correlation function (PCF). In particular, we express generalization gap in terms of the power spectra of the sample design and that of the function to be learned. Using this framework, we show that space-filling sample designs, such as blue noise and Poisson disk sampling, which optimize spectral properties, outperform random designs in terms of the generalization gap and characterize this gain in a closed-form. Our analysis also sheds light on design principles for constructing optimal task-agnostic sample designs that minimize the generalization gap. We corroborate our findings using regression experiments with neural networks on: a) synthetic functions, and b) a complex scientific simulator for inertial confinement fusion (ICF).
Author Information
Bhavya Kailkhura (Lawrence Livermore National Laboratory)
Jayaraman Thiagarajan (Lawrence Livermore National Labs)
Qunwei Li (Ant Financial)
Jize Zhang (Lawrence Livermore National Laboratory)
Yi Zhou (University of Utah)
Timo Bremer (Lawrence Livermore National Laboratory)
More from the Same Authors
-
2021 : Unsupervised Attribute Alignment for Characterizing Distribution Shift »
Matthew Olson · Rushil Anirudh · Jayaraman Thiagarajan · Timo Bremer · Weng-Keen Wong · Shusen Liu -
2021 : Geometric Priors for Scientific Generative Models in Inertial Confinement Fusion »
Ankita Shukla · Rushil Anirudh · Eugene Kur · Jayaraman Thiagarajan · Timo Bremer · Brian K Spears · Tammy Ma · Pavan Turaga -
2022 : A Closer Look at Model Adaptation using Feature Distortion and Simplicity Bias »
Puja Trivedi · Danai Koutra · Jayaraman Thiagarajan -
2022 : Do Domain Generalization Methods Generalize Well? »
Akshay Mehra · Bhavya Kailkhura · Pin-Yu Chen · Jihun Hamm -
2022 Spotlight: Single Model Uncertainty Estimation via Stochastic Data Centering »
Jayaraman Thiagarajan · Rushil Anirudh · Vivek Sivaraman Narayanaswamy · Timo Bremer -
2022 Poster: Single Model Uncertainty Estimation via Stochastic Data Centering »
Jayaraman Thiagarajan · Rushil Anirudh · Vivek Sivaraman Narayanaswamy · Timo Bremer -
2022 Poster: Analyzing Data-Centric Properties for Graph Contrastive Learning »
Puja Trivedi · Ekdeep S Lubana · Mark Heimann · Danai Koutra · Jayaraman Thiagarajan -
2022 Poster: Finding Correlated Equilibrium of Constrained Markov Game: A Primal-Dual Approach »
Ziyi Chen · Shaocong Ma · Yi Zhou -
2022 Poster: Models Out of Line: A Fourier Lens on Distribution Shift Robustness »
Sara Fridovich-Keil · Brian Bartoldson · James Diffenderfer · Bhavya Kailkhura · Timo Bremer -
2021 Poster: G-PATE: Scalable Differentially Private Data Generator via Private Aggregation of Teacher Discriminators »
Yunhui Long · Boxin Wang · Zhuolin Yang · Bhavya Kailkhura · Aston Zhang · Carl Gunter · Bo Li -
2021 Poster: Non-Asymptotic Analysis for Two Time-scale TDC with General Smooth Function Approximation »
Yue Wang · Shaofeng Zou · Yi Zhou -
2021 Poster: A Winning Hand: Compressing Deep Networks Can Improve Out-of-Distribution Robustness »
James Diffenderfer · Brian Bartoldson · Shreya Chaganti · Jize Zhang · Bhavya Kailkhura -
2021 Poster: Understanding the Limits of Unsupervised Domain Adaptation via Data Poisoning »
Akshay Mehra · Bhavya Kailkhura · Pin-Yu Chen · Jihun Hamm -
2020 Poster: Automatic Perturbation Analysis for Scalable Certified Robustness and Beyond »
Kaidi Xu · Zhouxing Shi · Huan Zhang · Yihan Wang · Kai-Wei Chang · Minlie Huang · Bhavya Kailkhura · Xue Lin · Cho-Jui Hsieh -
2020 Poster: Variance-Reduced Off-Policy TDC Learning: Non-Asymptotic Convergence Analysis »
Shaocong Ma · Yi Zhou · Shaofeng Zou -
2019 : Poster Session »
Jonathan Scarlett · Piotr Indyk · Ali Vakilian · Adrian Weller · Partha P Mitra · Benjamin Aubin · Bruno Loureiro · Florent Krzakala · Lenka Zdeborová · Kristina Monakhova · Joshua Yurtsever · Laura Waller · Hendrik Sommerhoff · Michael Moeller · Rushil Anirudh · Shuang Qiu · Xiaohan Wei · Zhuoran Yang · Jayaraman Thiagarajan · Salman Asif · Michael Gillhofer · Johannes Brandstetter · Sepp Hochreiter · Felix Petersen · Dhruv Patel · Assad Oberai · Akshay Kamath · Sushrut Karmalkar · Eric Price · Ali Ahmed · Zahra Kadkhodaie · Sreyas Mohan · Eero Simoncelli · Carlos Fernandez-Granda · Oscar Leong · Wesam Sakla · Rebecca Willett · Stephan Hoyer · Jascha Sohl-Dickstein · Sam Greydanus · Gauri Jagatap · Chinmay Hegde · Michael Kellman · Jonathan Tamir · Nouamane Laanait · Ousmane Dia · Mirco Ravanelli · Jonathan Binas · Negar Rostamzadeh · Shirin Jalali · Tiantian Fang · Alex Schwing · SĂ©bastien Lachapelle · Philippe Brouillard · Tristan Deleu · Simon Lacoste-Julien · Stella Yu · Arya Mazumdar · Ankit Singh Rawat · Yue Zhao · Jianshu Chen · Xiaoyang Li · Hubert Ramsauer · Gabrio Rizzuti · Nikolaos Mitsakos · Dingzhou Cao · Thomas Strohmer · Yang Li · Pei Peng · Gregory Ongie -
2019 Poster: SpiderBoost and Momentum: Faster Variance Reduction Algorithms »
Zhe Wang · Kaiyi Ji · Yi Zhou · Yingbin Liang · Vahid Tarokh -
2018 Poster: Zeroth-Order Stochastic Variance Reduction for Nonconvex Optimization »
Sijia Liu · Bhavya Kailkhura · Pin-Yu Chen · Paishun Ting · Shiyu Chang · Lisa Amini