Timezone: »

Poster
Efficient Aggregated Kernel Tests using Incomplete $U$-statistics
Antonin Schrab · Ilmun Kim · Benjamin Guedj · Arthur Gretton

Thu Dec 01 09:00 AM -- 11:00 AM (PST) @ Hall J #941
We propose a series of computationally efficient, nonparametric tests for the two-sample, independence and goodness-of-fit problems, using the Maximum Mean Discrepancy (MMD), Hilbert Schmidt Independence Criterion (HSIC), and Kernel Stein Discrepancy (KSD), respectively. Our test statistics are incomplete $U$-statistics, with a computational cost that interpolates between linear time in the number of samples, and quadratic time, as associated with classical $U$-statistic tests. The three proposed tests aggregate over several kernel bandwidths to detect departures from the null on various scales: we call the resulting tests MMDAggInc, HSICAggInc and KSDAggInc. This procedure provides a solution to the fundamental kernel selection problem as we can aggregate a large number of kernels with several bandwidths without incurring a significant loss of test power. For the test thresholds, we derive a quantile bound for wild bootstrapped incomplete $U$-statistics, which is of independent interest. We derive non-asymptotic uniform separation rates for MMDAggInc and HSICAggInc, and quantify exactly the trade-off between computational efficiency and the attainable rates: this result is novel for tests based on incomplete $U$-statistics, to our knowledge. We further show that in the quadratic-time case, the wild bootstrap incurs no penalty to test power over more widespread permutation-based approaches, since both attain the same minimax optimal rates (which in turn match the rates that use oracle quantiles). We support our claims with numerical experiments on the trade-off between computational efficiency and test power. In all three testing frameworks, our proposed linear-time tests outperform the current linear-time state-of-the-art tests (or at least match their test power).

#### Author Information

##### Antonin Schrab (University College London, AI Centre & Gatsby Unit)

Antonin Schrab is a PhD student at University College London who is jointly supervised by Benjamin Guedj at the UCL Centre for Artificial Intelligence and Inria London, and by Arthur Gretton at the Gatsby Computational Neuroscience Unit. His research interests include kernel methods, PAC-Bayes and generative models. He has recently focused on proving theoretical guarantees for kernel-based aggregated testing procedures.

##### Benjamin Guedj (Inria &amp; University College London)

Benjamin Guedj is a tenured research scientist at Inria since 2014, affiliated to the Lille - Nord Europe research centre in France. He is also affiliated with the mathematics department of the University of Lille. Since 2018, he is a Principal Research Fellow at the Centre for Artificial Intelligence and Department of Computer Science at University College London. He is also a visiting researcher at The Alan Turing Institute. Since 2020, he is the founder and scientific director of The Inria London Programme, a strategic partnership between Inria and UCL as part of a France-UK scientific initiative. He obtained his Ph.D. in mathematics in 2013 from UPMC (Université Pierre & Marie Curie, France) under the supervision of Gérard Biau and Éric Moulines. Prior to that, he was a research assistant at DTU Compute (Denmark). His main line of research is in statistical machine learning, both from theoretical and algorithmic perspectives. He is primarily interested in the design, analysis and implementation of statistical machine learning methods for high dimensional problems, mainly using the PAC-Bayesian theory.

##### Arthur Gretton (Gatsby Unit, UCL)

Arthur Gretton is a Professor with the Gatsby Computational Neuroscience Unit at UCL. He received degrees in Physics and Systems Engineering from the Australian National University, and a PhD with Microsoft Research and the Signal Processing and Communications Laboratory at the University of Cambridge. He previously worked at the MPI for Biological Cybernetics, and at the Machine Learning Department, Carnegie Mellon University. Arthur's recent research interests in machine learning include the design and training of generative models, both implicit (e.g. GANs) and explicit (high/infinite dimensional exponential family models), nonparametric hypothesis testing, and kernel methods. He has been an associate editor at IEEE Transactions on Pattern Analysis and Machine Intelligence from 2009 to 2013, an Action Editor for JMLR since April 2013, an Area Chair for NeurIPS in 2008 and 2009, a Senior Area Chair for NeurIPS in 2018, an Area Chair for ICML in 2011 and 2012, and a member of the COLT Program Committee in 2013. Arthur was program chair for AISTATS in 2016 (with Christian Robert), tutorials chair for ICML 2018 (with Ruslan Salakhutdinov), workshops chair for ICML 2019 (with Honglak Lee), program chair for the Dali workshop in 2019 (with Krikamol Muandet and Shakir Mohammed), and co-organsier of the Machine Learning Summer School 2019 in London (with Marc Deisenroth).